From 07d241fd66ba99111d43a0a4c4abeeb972468d1d Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:47 +0100
Subject: docs: filesystems: convert 9p.txt to ReST

- Add a SPDX header;
- Add a document title;
- Adjust section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/96a060b7b5c0c3838ab1751addfe4d6d3bc37bd6.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/9p.rst    | 185 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/9p.txt    | 161 -------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 186 insertions(+), 161 deletions(-)
 create mode 100644 Documentation/filesystems/9p.rst
 delete mode 100644 Documentation/filesystems/9p.txt
diff --git a/Documentation/filesystems/9p.rst b/Documentation/filesystems/9p.rst
new file mode 100644
index 000000000000..f054d1c45e86
--- /dev/null
+++ b/Documentation/filesystems/9p.rst
@@ -0,0 +1,185 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================================
+v9fs: Plan 9 Resource Sharing for Linux
+=======================================
+
+About
+=====
+
+v9fs is a Unix implementation of the Plan 9 9p remote filesystem protocol.
+
+This software was originally developed by Ron Minnich <rminnich@sandia.gov>
+and Maya Gokhale.  Additional development by Greg Watson
+<gwatson@lanl.gov> and most recently Eric Van Hensbergen
+<ericvh@gmail.com>, Latchesar Ionkov <lucho@ionkov.net> and Russ Cox
+<rsc@swtch.com>.
+
+The best detailed explanation of the Linux implementation and applications of
+the 9p client is available in the form of a USENIX paper:
+
+   http://www.usenix.org/events/usenix05/tech/freenix/hensbergen.html
+
+Other applications are described in the following papers:
+
+	* XCPU & Clustering
+	  http://xcpu.org/papers/xcpu-talk.pdf
+	* KVMFS: control file system for KVM
+	  http://xcpu.org/papers/kvmfs.pdf
+	* CellFS: A New Programming Model for the Cell BE
+	  http://xcpu.org/papers/cellfs-talk.pdf
+	* PROSE I/O: Using 9p to enable Application Partitions
+	  http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf
+	* VirtFS: A Virtualization Aware File System pass-through
+	  http://goo.gl/3WPDg
+
+Usage
+=====
+
+For remote file server::
+
+	mount -t 9p 10.10.1.2 /mnt/9
+
+For Plan 9 From User Space applications (http://swtch.com/plan9)::
+
+	mount -t 9p `namespace`/acme /mnt/9 -o trans=unix,uname=$USER
+
+For server running on QEMU host with virtio transport::
+
+	mount -t 9p -o trans=virtio <mount_tag> /mnt/9
+
+where mount_tag is the tag associated by the server to each of the exported
+mount points. Each 9P export is seen by the client as a virtio device with an
+associated "mount_tag" property. Available mount tags can be
+seen by reading /sys/bus/virtio/drivers/9pnet_virtio/virtio<n>/mount_tag files.
+
+Options
+=======
+
+  ============= ===============================================================
+  trans=name	select an alternative transport.  Valid options are
+  		currently:
+
+			========  ============================================
+			unix 	  specifying a named pipe mount point
+			tcp	  specifying a normal TCP/IP connection
+			fd   	  used passed file descriptors for connection
+                                  (see rfdno and wfdno)
+			virtio	  connect to the next virtio channel available
+				  (from QEMU with trans_virtio module)
+			rdma	  connect to a specified RDMA channel
+			========  ============================================
+
+  uname=name	user name to attempt mount as on the remote server.  The
+  		server may override or ignore this value.  Certain user
+		names may require authentication.
+
+  aname=name	aname specifies the file tree to access when the server is
+  		offering several exported file systems.
+
+  cache=mode	specifies a caching policy.  By default, no caches are used.
+
+                        none
+				default no cache policy, metadata and data
+                                alike are synchronous.
+			loose
+				no attempts are made at consistency,
+                                intended for exclusive, read-only mounts
+                        fscache
+				use FS-Cache for a persistent, read-only
+				cache backend.
+                        mmap
+				minimal cache that is only used for read-write
+                                mmap.  Northing else is cached, like cache=none
+
+  debug=n	specifies debug level.  The debug level is a bitmask.
+
+			=====   ================================
+			0x01    display verbose error messages
+			0x02    developer debug (DEBUG_CURRENT)
+			0x04    display 9p trace
+			0x08    display VFS trace
+			0x10    display Marshalling debug
+			0x20    display RPC debug
+			0x40    display transport debug
+			0x80    display allocation debug
+			0x100   display protocol message debug
+			0x200   display Fid debug
+			0x400   display packet debug
+			0x800   display fscache tracing debug
+			=====   ================================
+
+  rfdno=n	the file descriptor for reading with trans=fd
+
+  wfdno=n	the file descriptor for writing with trans=fd
+
+  msize=n	the number of bytes to use for 9p packet payload
+
+  port=n	port to connect to on the remote server
+
+  noextend	force legacy mode (no 9p2000.u or 9p2000.L semantics)
+
+  version=name	Select 9P protocol version. Valid options are:
+
+			========        ==============================
+			9p2000          Legacy mode (same as noextend)
+			9p2000.u        Use 9P2000.u protocol
+			9p2000.L        Use 9P2000.L protocol
+			========        ==============================
+
+  dfltuid	attempt to mount as a particular uid
+
+  dfltgid	attempt to mount with a particular gid
+
+  afid		security channel - used by Plan 9 authentication protocols
+
+  nodevmap	do not map special files - represent them as normal files.
+  		This can be used to share devices/named pipes/sockets between
+		hosts.  This functionality will be expanded in later versions.
+
+  access	there are four access modes.
+			user
+				if a user tries to access a file on v9fs
+			        filesystem for the first time, v9fs sends an
+			        attach command (Tattach) for that user.
+				This is the default mode.
+			<uid>
+				allows only user with uid=<uid> to access
+				the files on the mounted filesystem
+			any
+				v9fs does single attach and performs all
+				operations as one user
+			clien
+				 ACL based access check on the 9p client
+			         side for access validation
+
+  cachetag	cache tag to use the specified persistent cache.
+		cache tags for existing cache sessions can be listed at
+		/sys/fs/9p/caches. (applies only to cache=fscache)
+  ============= ===============================================================
+
+Resources
+=========
+
+Protocol specifications are maintained on github:
+http://ericvh.github.com/9p-rfc/
+
+9p client and server implementations are listed on
+http://9p.cat-v.org/implementations
+
+A 9p2000.L server is being developed by LLNL and can be found
+at http://code.google.com/p/diod/
+
+There are user and developer mailing lists available through the v9fs project
+on sourceforge (http://sourceforge.net/projects/v9fs).
+
+News and other information is maintained on a Wiki.
+(http://sf.net/apps/mediawiki/v9fs/index.php).
+
+Bug reports are best issued via the mailing list.
+
+For more information on the Plan 9 Operating System check out
+http://plan9.bell-labs.com/plan9
+
+For information on Plan 9 from User Space (Plan 9 applications and libraries
+ported to Linux/BSD/OSX/etc) check out http://swtch.com/plan9
diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt
deleted file mode 100644
index fec7144e817c..000000000000
--- a/Documentation/filesystems/9p.txt
+++ /dev/null
@@ -1,161 +0,0 @@
-	  	    v9fs: Plan 9 Resource Sharing for Linux
-		    =======================================
-
-ABOUT
-=====
-
-v9fs is a Unix implementation of the Plan 9 9p remote filesystem protocol.
-
-This software was originally developed by Ron Minnich <rminnich@sandia.gov>
-and Maya Gokhale.  Additional development by Greg Watson
-<gwatson@lanl.gov> and most recently Eric Van Hensbergen
-<ericvh@gmail.com>, Latchesar Ionkov <lucho@ionkov.net> and Russ Cox
-<rsc@swtch.com>.
-
-The best detailed explanation of the Linux implementation and applications of
-the 9p client is available in the form of a USENIX paper:
-   http://www.usenix.org/events/usenix05/tech/freenix/hensbergen.html
-
-Other applications are described in the following papers:
-	* XCPU & Clustering
-		http://xcpu.org/papers/xcpu-talk.pdf
-	* KVMFS: control file system for KVM
-		http://xcpu.org/papers/kvmfs.pdf
-	* CellFS: A New Programming Model for the Cell BE
-		http://xcpu.org/papers/cellfs-talk.pdf
-	* PROSE I/O: Using 9p to enable Application Partitions
-		http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf
-	* VirtFS: A Virtualization Aware File System pass-through
-		http://goo.gl/3WPDg
-
-USAGE
-=====
-
-For remote file server:
-
-	mount -t 9p 10.10.1.2 /mnt/9
-
-For Plan 9 From User Space applications (http://swtch.com/plan9)
-
-	mount -t 9p `namespace`/acme /mnt/9 -o trans=unix,uname=$USER
-
-For server running on QEMU host with virtio transport:
-
-	mount -t 9p -o trans=virtio <mount_tag> /mnt/9
-
-where mount_tag is the tag associated by the server to each of the exported
-mount points. Each 9P export is seen by the client as a virtio device with an
-associated "mount_tag" property. Available mount tags can be
-seen by reading /sys/bus/virtio/drivers/9pnet_virtio/virtio<n>/mount_tag files.
-
-OPTIONS
-=======
-
-  trans=name	select an alternative transport.  Valid options are
-  		currently:
-			unix 	- specifying a named pipe mount point
-			tcp	- specifying a normal TCP/IP connection
-			fd   	- used passed file descriptors for connection
-                                (see rfdno and wfdno)
-			virtio	- connect to the next virtio channel available
-				(from QEMU with trans_virtio module)
-			rdma	- connect to a specified RDMA channel
-
-  uname=name	user name to attempt mount as on the remote server.  The
-  		server may override or ignore this value.  Certain user
-		names may require authentication.
-
-  aname=name	aname specifies the file tree to access when the server is
-  		offering several exported file systems.
-
-  cache=mode	specifies a caching policy.  By default, no caches are used.
-                        none = default no cache policy, metadata and data
-                                alike are synchronous.
-			loose = no attempts are made at consistency,
-                                intended for exclusive, read-only mounts
-                        fscache = use FS-Cache for a persistent, read-only
-				cache backend.
-                        mmap = minimal cache that is only used for read-write
-                                mmap.  Northing else is cached, like cache=none
-
-  debug=n	specifies debug level.  The debug level is a bitmask.
-			0x01  = display verbose error messages
-			0x02  = developer debug (DEBUG_CURRENT)
-			0x04  = display 9p trace
-			0x08  = display VFS trace
-			0x10  = display Marshalling debug
-			0x20  = display RPC debug
-			0x40  = display transport debug
-			0x80  = display allocation debug
-			0x100 = display protocol message debug
-			0x200 = display Fid debug
-			0x400 = display packet debug
-			0x800 = display fscache tracing debug
-
-  rfdno=n	the file descriptor for reading with trans=fd
-
-  wfdno=n	the file descriptor for writing with trans=fd
-
-  msize=n	the number of bytes to use for 9p packet payload
-
-  port=n	port to connect to on the remote server
-
-  noextend	force legacy mode (no 9p2000.u or 9p2000.L semantics)
-
-  version=name	Select 9P protocol version. Valid options are:
-			9p2000          - Legacy mode (same as noextend)
-			9p2000.u        - Use 9P2000.u protocol
-			9p2000.L        - Use 9P2000.L protocol
-
-  dfltuid	attempt to mount as a particular uid
-
-  dfltgid	attempt to mount with a particular gid
-
-  afid		security channel - used by Plan 9 authentication protocols
-
-  nodevmap	do not map special files - represent them as normal files.
-  		This can be used to share devices/named pipes/sockets between
-		hosts.  This functionality will be expanded in later versions.
-
-  access	there are four access modes.
-			user  = if a user tries to access a file on v9fs
-			        filesystem for the first time, v9fs sends an
-			        attach command (Tattach) for that user.
-				This is the default mode.
-			<uid> = allows only user with uid=<uid> to access
-				the files on the mounted filesystem
-			any   = v9fs does single attach and performs all
-				operations as one user
-			client = ACL based access check on the 9p client
-			         side for access validation
-
-  cachetag	cache tag to use the specified persistent cache.
-		cache tags for existing cache sessions can be listed at
-		/sys/fs/9p/caches. (applies only to cache=fscache)
-
-RESOURCES
-=========
-
-Protocol specifications are maintained on github:
-http://ericvh.github.com/9p-rfc/
-
-9p client and server implementations are listed on
-http://9p.cat-v.org/implementations
-
-A 9p2000.L server is being developed by LLNL and can be found
-at http://code.google.com/p/diod/
-
-There are user and developer mailing lists available through the v9fs project
-on sourceforge (http://sourceforge.net/projects/v9fs).
-
-News and other information is maintained on a Wiki.
-(http://sf.net/apps/mediawiki/v9fs/index.php).
-
-Bug reports are best issued via the mailing list.
-
-For more information on the Plan 9 Operating System check out
-http://plan9.bell-labs.com/plan9
-
-For information on Plan 9 from User Space (Plan 9 applications and libraries
-ported to Linux/BSD/OSX/etc) check out http://swtch.com/plan9
-
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 45d791905e91..a9330c3f8c2e 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -46,6 +46,7 @@ Documentation for filesystem implementations.
 .. toctree::
    :maxdepth: 2
 
+   9p
    autofs
    fuse
    overlayfs
-- 
cgit 


From 348739003d4f7e777ef935a44a91e7494f8ab786 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:48 +0100
Subject: docs: filesystems: convert adfs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Adjust section titles;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/15ee92f03ec917e5d26bd7b863565dec88c843f6.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/adfs.rst  | 108 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/adfs.txt  |  99 ---------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 109 insertions(+), 99 deletions(-)
 create mode 100644 Documentation/filesystems/adfs.rst
 delete mode 100644 Documentation/filesystems/adfs.txt

diff --git a/Documentation/filesystems/adfs.rst b/Documentation/filesystems/adfs.rst
new file mode 100644
index 000000000000..5b22cae38e5e
--- /dev/null
+++ b/Documentation/filesystems/adfs.rst
@@ -0,0 +1,108 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================
+Acorn Disc Filing System - ADFS
+===============================
+
+Filesystems supported by ADFS
+-----------------------------
+
+The ADFS module supports the following Filecore formats which have:
+
+- new maps
+- new directories or big directories
+
+In terms of the named formats, this means we support:
+
+- E and E+, with or without boot block
+- F and F+
+
+We fully support reading files from these filesystems, and writing to
+existing files within their existing allocation.  Essentially, we do
+not support changing any of the filesystem metadata.
+
+This is intended to support loopback mounted Linux native filesystems
+on a RISC OS Filecore filesystem, but will allow the data within files
+to be changed.
+
+If write support (ADFS_FS_RW) is configured, we allow rudimentary
+directory updates, specifically updating the access mode and timestamp.
+
+Mount options for ADFS
+----------------------
+
+  ============  ======================================================
+  uid=nnn	All files in the partition will be owned by
+		user id nnn.  Default 0 (root).
+  gid=nnn	All files in the partition will be in group
+		nnn.  Default 0 (root).
+  ownmask=nnn	The permission mask for ADFS 'owner' permissions
+		will be nnn.  Default 0700.
+  othmask=nnn	The permission mask for ADFS 'other' permissions
+		will be nnn.  Default 0077.
+  ftsuffix=n	When ftsuffix=0, no file type suffix will be applied.
+		When ftsuffix=1, a hexadecimal suffix corresponding to
+		the RISC OS file type will be added.  Default 0.
+  ============  ======================================================
+
+Mapping of ADFS permissions to Linux permissions
+------------------------------------------------
+
+  ADFS permissions consist of the following:
+
+	- Owner read
+	- Owner write
+	- Other read
+	- Other write
+
+  (In older versions, an 'execute' permission did exist, but this
+  does not hold the same meaning as the Linux 'execute' permission
+  and is now obsolete).
+
+  The mapping is performed as follows::
+
+	Owner read				-> -r--r--r--
+	Owner write				-> --w--w---w
+	Owner read and filetype UnixExec	-> ---x--x--x
+    These are then masked by ownmask, eg 700	-> -rwx------
+	Possible owner mode permissions		-> -rwx------
+
+	Other read				-> -r--r--r--
+	Other write				-> --w--w--w-
+	Other read and filetype UnixExec	-> ---x--x--x
+    These are then masked by othmask, eg 077	-> ----rwxrwx
+	Possible other mode permissions		-> ----rwxrwx
+
+  Hence, with the default masks, if a file is owner read/write, and
+  not a UnixExec filetype, then the permissions will be::
+
+			-rw-------
+
+  However, if the masks were ownmask=0770,othmask=0007, then this would
+  be modified to::
+
+			-rw-rw----
+
+  There is no restriction on what you can do with these masks.  You may
+  wish that either read bits give read access to the file for all, but
+  keep the default write protection (ownmask=0755,othmask=0577)::
+
+			-rw-r--r--
+
+  You can therefore tailor the permission translation to whatever you
+  desire the permissions should be under Linux.
+
+RISC OS file type suffix
+------------------------
+
+  RISC OS file types are stored in bits 19..8 of the file load address.
+
+  To enable non-RISC OS systems to be used to store files without losing
+  file type information, a file naming convention was devised (initially
+  for use with NFS) such that a hexadecimal suffix of the form ,xyz
+  denoted the file type: e.g. BasicFile,ffb is a BASIC (0xffb) file.  This
+  naming convention is now also used by RISC OS emulators such as RPCEmu.
+
+  Mounting an ADFS disc with option ftsuffix=1 will cause appropriate file
+  type suffixes to be appended to file names read from a directory.  If the
+  ftsuffix option is zero or omitted, no file type suffixes will be added.
diff --git a/Documentation/filesystems/adfs.txt b/Documentation/filesystems/adfs.txt
deleted file mode 100644
index 0baa8e8c1fc1..000000000000
--- a/Documentation/filesystems/adfs.txt
+++ /dev/null
@@ -1,99 +0,0 @@
-Filesystems supported by ADFS
------------------------------
-
-The ADFS module supports the following Filecore formats which have:
-
-- new maps
-- new directories or big directories
-
-In terms of the named formats, this means we support:
-
-- E and E+, with or without boot block
-- F and F+
-
-We fully support reading files from these filesystems, and writing to
-existing files within their existing allocation.  Essentially, we do
-not support changing any of the filesystem metadata.
-
-This is intended to support loopback mounted Linux native filesystems
-on a RISC OS Filecore filesystem, but will allow the data within files
-to be changed.
-
-If write support (ADFS_FS_RW) is configured, we allow rudimentary
-directory updates, specifically updating the access mode and timestamp.
-
-Mount options for ADFS
-----------------------
-
-  uid=nnn	All files in the partition will be owned by
-		user id nnn.  Default 0 (root).
-  gid=nnn	All files in the partition will be in group
-		nnn.  Default 0 (root).
-  ownmask=nnn	The permission mask for ADFS 'owner' permissions
-		will be nnn.  Default 0700.
-  othmask=nnn	The permission mask for ADFS 'other' permissions
-		will be nnn.  Default 0077.
-  ftsuffix=n	When ftsuffix=0, no file type suffix will be applied.
-		When ftsuffix=1, a hexadecimal suffix corresponding to
-		the RISC OS file type will be added.  Default 0.
-
-Mapping of ADFS permissions to Linux permissions
-------------------------------------------------
-
-  ADFS permissions consist of the following:
-
-	Owner read
-	Owner write
-	Other read
-	Other write
-
-  (In older versions, an 'execute' permission did exist, but this
-   does not hold the same meaning as the Linux 'execute' permission
-   and is now obsolete).
-
-  The mapping is performed as follows:
-
-	Owner read				-> -r--r--r--
-	Owner write				-> --w--w---w
-	Owner read and filetype UnixExec	-> ---x--x--x
-    These are then masked by ownmask, eg 700	-> -rwx------
-	Possible owner mode permissions		-> -rwx------
-
-	Other read				-> -r--r--r--
-	Other write				-> --w--w--w-
-	Other read and filetype UnixExec	-> ---x--x--x
-    These are then masked by othmask, eg 077	-> ----rwxrwx
-	Possible other mode permissions		-> ----rwxrwx
-
-  Hence, with the default masks, if a file is owner read/write, and
-  not a UnixExec filetype, then the permissions will be:
-
-			-rw-------
-
-  However, if the masks were ownmask=0770,othmask=0007, then this would
-  be modified to:
-			-rw-rw----
-
-  There is no restriction on what you can do with these masks.  You may
-  wish that either read bits give read access to the file for all, but
-  keep the default write protection (ownmask=0755,othmask=0577):
-
-			-rw-r--r--
-
-  You can therefore tailor the permission translation to whatever you
-  desire the permissions should be under Linux.
-
-RISC OS file type suffix
-------------------------
-
-  RISC OS file types are stored in bits 19..8 of the file load address.
-
-  To enable non-RISC OS systems to be used to store files without losing
-  file type information, a file naming convention was devised (initially
-  for use with NFS) such that a hexadecimal suffix of the form ,xyz
-  denoted the file type: e.g. BasicFile,ffb is a BASIC (0xffb) file.  This
-  naming convention is now also used by RISC OS emulators such as RPCEmu.
-
-  Mounting an ADFS disc with option ftsuffix=1 will cause appropriate file
-  type suffixes to be appended to file names read from a directory.  If the
-  ftsuffix option is zero or omitted, no file type suffixes will be added.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index a9330c3f8c2e..14dc89c94822 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -47,6 +47,7 @@ Documentation for filesystem implementations.
    :maxdepth: 2
 
    9p
+   adfs
    autofs
    fuse
    overlayfs
-- 
cgit 


From 7627216830d808572fff8225964e9209249ba196 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:49 +0100
Subject: docs: filesystems: convert affs.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Add table markups;
- Mark literal blocks as such;
- Some whitespace fixes and new line breaks;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: David Sterba <dsterba@suse.com>
Link: https://lore.kernel.org/r/b44c56befe0e28cbc0eb1b3e281ad7d99737ff16.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/affs.rst  | 246 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/affs.txt  | 222 --------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 247 insertions(+), 222 deletions(-)
 create mode 100644 Documentation/filesystems/affs.rst
 delete mode 100644 Documentation/filesystems/affs.txt

diff --git a/Documentation/filesystems/affs.rst b/Documentation/filesystems/affs.rst
new file mode 100644
index 000000000000..7f1a40dce6d3
--- /dev/null
+++ b/Documentation/filesystems/affs.rst
@@ -0,0 +1,246 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============================
+Overview of Amiga Filesystems
+=============================
+
+Not all varieties of the Amiga filesystems are supported for reading and
+writing. The Amiga currently knows six different filesystems:
+
+==============	===============================================================
+DOS\0		The old or original filesystem, not really suited for
+		hard disks and normally not used on them, either.
+		Supported read/write.
+
+DOS\1		The original Fast File System. Supported read/write.
+
+DOS\2		The old "international" filesystem. International means that
+		a bug has been fixed so that accented ("international") letters
+		in file names are case-insensitive, as they ought to be.
+		Supported read/write.
+
+DOS\3		The "international" Fast File System.  Supported read/write.
+
+DOS\4		The original filesystem with directory cache. The directory
+		cache speeds up directory accesses on floppies considerably,
+		but slows down file creation/deletion. Doesn't make much
+		sense on hard disks. Supported read only.
+
+DOS\5		The Fast File System with directory cache. Supported read only.
+==============	===============================================================
+
+All of the above filesystems allow block sizes from 512 to 32K bytes.
+Supported block sizes are: 512, 1024, 2048 and 4096 bytes. Larger blocks
+speed up almost everything at the expense of wasted disk space. The speed
+gain above 4K seems not really worth the price, so you don't lose too
+much here, either.
+
+The muFS (multi user File System) equivalents of the above file systems
+are supported, too.
+
+Mount options for the AFFS
+==========================
+
+protect
+		If this option is set, the protection bits cannot be altered.
+
+setuid[=uid]
+		This sets the owner of all files and directories in the file
+		system to uid or the uid of the current user, respectively.
+
+setgid[=gid]
+		Same as above, but for gid.
+
+mode=mode
+		Sets the mode flags to the given (octal) value, regardless
+		of the original permissions. Directories will get an x
+		permission if the corresponding r bit is set.
+		This is useful since most of the plain AmigaOS files
+		will map to 600.
+
+nofilenametruncate
+		The file system will return an error when filename exceeds
+		standard maximum filename length (30 characters).
+
+reserved=num
+		Sets the number of reserved blocks at the start of the
+		partition to num. You should never need this option.
+		Default is 2.
+
+root=block
+		Sets the block number of the root block. This should never
+		be necessary.
+
+bs=blksize
+		Sets the blocksize to blksize. Valid block sizes are 512,
+		1024, 2048 and 4096. Like the root option, this should
+		never be necessary, as the affs can figure it out itself.
+
+quiet
+		The file system will not return an error for disallowed
+		mode changes.
+
+verbose
+		The volume name, file system type and block size will
+		be written to the syslog when the filesystem is mounted.
+
+mufs
+		The filesystem is really a muFS, also it doesn't
+		identify itself as one. This option is necessary if
+		the filesystem wasn't formatted as muFS, but is used
+		as one.
+
+prefix=path
+		Path will be prefixed to every absolute path name of
+		symbolic links on an AFFS partition. Default = "/".
+		(See below.)
+
+volume=name
+		When symbolic links with an absolute path are created
+		on an AFFS partition, name will be prepended as the
+		volume name. Default = "" (empty string).
+		(See below.)
+
+Handling of the Users/Groups and protection flags
+=================================================
+
+Amiga -> Linux:
+
+The Amiga protection flags RWEDRWEDHSPARWED are handled as follows:
+
+  - R maps to r for user, group and others. On directories, R implies x.
+
+  - If both W and D are allowed, w will be set.
+
+  - E maps to x.
+
+  - H and P are always retained and ignored under Linux.
+
+  - A is always reset when a file is written to.
+
+User id and group id will be used unless set[gu]id are given as mount
+options. Since most of the Amiga file systems are single user systems
+they will be owned by root. The root directory (the mount point) of the
+Amiga filesystem will be owned by the user who actually mounts the
+filesystem (the root directory doesn't have uid/gid fields).
+
+Linux -> Amiga:
+
+The Linux rwxrwxrwx file mode is handled as follows:
+
+  - r permission will set R for user, group and others.
+
+  - w permission will set W and D for user, group and others.
+
+  - x permission of the user will set E for plain files.
+
+  - All other flags (suid, sgid, ...) are ignored and will
+    not be retained.
+
+Newly created files and directories will get the user and group ID
+of the current user and a mode according to the umask.
+
+Symbolic links
+==============
+
+Although the Amiga and Linux file systems resemble each other, there
+are some, not always subtle, differences. One of them becomes apparent
+with symbolic links. While Linux has a file system with exactly one
+root directory, the Amiga has a separate root directory for each
+file system (for example, partition, floppy disk, ...). With the Amiga,
+these entities are called "volumes". They have symbolic names which
+can be used to access them. Thus, symbolic links can point to a
+different volume. AFFS turns the volume name into a directory name
+and prepends the prefix path (see prefix option) to it.
+
+Example:
+You mount all your Amiga partitions under /amiga/<volume> (where
+<volume> is the name of the volume), and you give the option
+"prefix=/amiga/" when mounting all your AFFS partitions. (They
+might be "User", "WB" and "Graphics", the mount points /amiga/User,
+/amiga/WB and /amiga/Graphics). A symbolic link referring to
+"User:sc/include/dos/dos.h" will be followed to
+"/amiga/User/sc/include/dos/dos.h".
+
+Examples
+========
+
+Command line::
+
+    mount  Archive/Amiga/Workbench3.1.adf /mnt -t affs -o loop,verbose
+    mount  /dev/sda3 /Amiga -t affs
+
+/etc/fstab entry::
+
+    /dev/sdb5	/amiga/Workbench    affs    noauto,user,exec,verbose 0 0
+
+IMPORTANT NOTE
+==============
+
+If you boot Windows 95 (don't know about 3.x, 98 and NT) while you
+have an Amiga harddisk connected to your PC, it will overwrite
+the bytes 0x00dc..0x00df of block 0 with garbage, thus invalidating
+the Rigid Disk Block. Sheer luck has it that this is an unused
+area of the RDB, so only the checksum doesn't match anymore.
+Linux will ignore this garbage and recognize the RDB anyway, but
+before you connect that drive to your Amiga again, you must
+restore or repair your RDB. So please do make a backup copy of it
+before booting Windows!
+
+If the damage is already done, the following should fix the RDB
+(where <disk> is the device name).
+
+DO AT YOUR OWN RISK::
+
+  dd if=/dev/<disk> of=rdb.tmp count=1
+  cp rdb.tmp rdb.fixed
+  dd if=/dev/zero of=rdb.fixed bs=1 seek=220 count=4
+  dd if=rdb.fixed of=/dev/<disk>
+
+Bugs, Restrictions, Caveats
+===========================
+
+Quite a few things may not work as advertised. Not everything is
+tested, though several hundred MB have been read and written using
+this fs. For a most up-to-date list of bugs please consult
+fs/affs/Changes.
+
+By default, filenames are truncated to 30 characters without warning.
+'nofilenametruncate' mount option can change that behavior.
+
+Case is ignored by the affs in filename matching, but Linux shells
+do care about the case. Example (with /wb being an affs mounted fs)::
+
+    rm /wb/WRONGCASE
+
+will remove /mnt/wrongcase, but::
+
+    rm /wb/WR*
+
+will not since the names are matched by the shell.
+
+The block allocation is designed for hard disk partitions. If more
+than 1 process writes to a (small) diskette, the blocks are allocated
+in an ugly way (but the real AFFS doesn't do much better). This
+is also true when space gets tight.
+
+You cannot execute programs on an OFS (Old File System), since the
+program files cannot be memory mapped due to the 488 byte blocks.
+For the same reason you cannot mount an image on such a filesystem
+via the loopback device.
+
+The bitmap valid flag in the root block may not be accurate when the
+system crashes while an affs partition is mounted. There's currently
+no way to fix a garbled filesystem without an Amiga (disk validator)
+or manually (who would do this?). Maybe later.
+
+If you mount affs partitions on system startup, you may want to tell
+fsck that the fs should not be checked (place a '0' in the sixth field
+of /etc/fstab).
+
+It's not possible to read floppy disks with a normal PC or workstation
+due to an incompatibility with the Amiga floppy controller.
+
+If you are interested in an Amiga Emulator for Linux, look at
+
+http://web.archive.org/web/%2E/http://www.freiburg.linux.de/~uae/
diff --git a/Documentation/filesystems/affs.txt b/Documentation/filesystems/affs.txt
deleted file mode 100644
index 71b63c2b9841..000000000000
--- a/Documentation/filesystems/affs.txt
+++ /dev/null
@@ -1,222 +0,0 @@
-Overview of Amiga Filesystems
-=============================
-
-Not all varieties of the Amiga filesystems are supported for reading and
-writing. The Amiga currently knows six different filesystems:
-
-DOS\0		The old or original filesystem, not really suited for
-		hard disks and normally not used on them, either.
-		Supported read/write.
-
-DOS\1		The original Fast File System. Supported read/write.
-
-DOS\2		The old "international" filesystem. International means that
-		a bug has been fixed so that accented ("international") letters
-		in file names are case-insensitive, as they ought to be.
-		Supported read/write.
-
-DOS\3		The "international" Fast File System.  Supported read/write.
-
-DOS\4		The original filesystem with directory cache. The directory
-		cache speeds up directory accesses on floppies considerably,
-		but slows down file creation/deletion. Doesn't make much
-		sense on hard disks. Supported read only.
-
-DOS\5		The Fast File System with directory cache. Supported read only.
-
-All of the above filesystems allow block sizes from 512 to 32K bytes.
-Supported block sizes are: 512, 1024, 2048 and 4096 bytes. Larger blocks
-speed up almost everything at the expense of wasted disk space. The speed
-gain above 4K seems not really worth the price, so you don't lose too
-much here, either.
-
-The muFS (multi user File System) equivalents of the above file systems
-are supported, too.
-
-Mount options for the AFFS
-==========================
-
-protect		If this option is set, the protection bits cannot be altered.
-
-setuid[=uid]	This sets the owner of all files and directories in the file
-		system to uid or the uid of the current user, respectively.
-
-setgid[=gid]	Same as above, but for gid.
-
-mode=mode	Sets the mode flags to the given (octal) value, regardless
-		of the original permissions. Directories will get an x
-		permission if the corresponding r bit is set.
-		This is useful since most of the plain AmigaOS files
-		will map to 600.
-
-nofilenametruncate
-		The file system will return an error when filename exceeds
-		standard maximum filename length (30 characters).
-
-reserved=num	Sets the number of reserved blocks at the start of the
-		partition to num. You should never need this option.
-		Default is 2.
-
-root=block	Sets the block number of the root block. This should never
-		be necessary.
-
-bs=blksize	Sets the blocksize to blksize. Valid block sizes are 512,
-		1024, 2048 and 4096. Like the root option, this should
-		never be necessary, as the affs can figure it out itself.
-
-quiet		The file system will not return an error for disallowed
-		mode changes.
-
-verbose		The volume name, file system type and block size will
-		be written to the syslog when the filesystem is mounted.
-
-mufs		The filesystem is really a muFS, also it doesn't
-		identify itself as one. This option is necessary if
-		the filesystem wasn't formatted as muFS, but is used
-		as one.
-
-prefix=path	Path will be prefixed to every absolute path name of
-		symbolic links on an AFFS partition. Default = "/".
-		(See below.)
-
-volume=name	When symbolic links with an absolute path are created
-		on an AFFS partition, name will be prepended as the
-		volume name. Default = "" (empty string).
-		(See below.)
-
-Handling of the Users/Groups and protection flags
-=================================================
-
-Amiga -> Linux:
-
-The Amiga protection flags RWEDRWEDHSPARWED are handled as follows:
-
-  - R maps to r for user, group and others. On directories, R implies x.
-
-  - If both W and D are allowed, w will be set.
-
-  - E maps to x.
-
-  - H and P are always retained and ignored under Linux.
-
-  - A is always reset when a file is written to.
-
-User id and group id will be used unless set[gu]id are given as mount
-options. Since most of the Amiga file systems are single user systems
-they will be owned by root. The root directory (the mount point) of the
-Amiga filesystem will be owned by the user who actually mounts the
-filesystem (the root directory doesn't have uid/gid fields).
-
-Linux -> Amiga:
-
-The Linux rwxrwxrwx file mode is handled as follows:
-
-  - r permission will set R for user, group and others.
-
-  - w permission will set W and D for user, group and others.
-
-  - x permission of the user will set E for plain files.
-
-  - All other flags (suid, sgid, ...) are ignored and will
-    not be retained.
-    
-Newly created files and directories will get the user and group ID
-of the current user and a mode according to the umask.
-
-Symbolic links
-==============
-
-Although the Amiga and Linux file systems resemble each other, there
-are some, not always subtle, differences. One of them becomes apparent
-with symbolic links. While Linux has a file system with exactly one
-root directory, the Amiga has a separate root directory for each
-file system (for example, partition, floppy disk, ...). With the Amiga,
-these entities are called "volumes". They have symbolic names which
-can be used to access them. Thus, symbolic links can point to a
-different volume. AFFS turns the volume name into a directory name
-and prepends the prefix path (see prefix option) to it.
-
-Example:
-You mount all your Amiga partitions under /amiga/<volume> (where
-<volume> is the name of the volume), and you give the option
-"prefix=/amiga/" when mounting all your AFFS partitions. (They
-might be "User", "WB" and "Graphics", the mount points /amiga/User,
-/amiga/WB and /amiga/Graphics). A symbolic link referring to
-"User:sc/include/dos/dos.h" will be followed to
-"/amiga/User/sc/include/dos/dos.h".
-
-Examples
-========
-
-Command line:
-    mount  Archive/Amiga/Workbench3.1.adf /mnt -t affs -o loop,verbose
-    mount  /dev/sda3 /Amiga -t affs
-
-/etc/fstab entry:
-    /dev/sdb5	/amiga/Workbench    affs    noauto,user,exec,verbose 0 0
-
-IMPORTANT NOTE
-==============
-
-If you boot Windows 95 (don't know about 3.x, 98 and NT) while you
-have an Amiga harddisk connected to your PC, it will overwrite
-the bytes 0x00dc..0x00df of block 0 with garbage, thus invalidating
-the Rigid Disk Block. Sheer luck has it that this is an unused
-area of the RDB, so only the checksum doesn't match anymore.
-Linux will ignore this garbage and recognize the RDB anyway, but
-before you connect that drive to your Amiga again, you must
-restore or repair your RDB. So please do make a backup copy of it
-before booting Windows!
-
-If the damage is already done, the following should fix the RDB
-(where <disk> is the device name).
-DO AT YOUR OWN RISK:
-
-  dd if=/dev/<disk> of=rdb.tmp count=1
-  cp rdb.tmp rdb.fixed
-  dd if=/dev/zero of=rdb.fixed bs=1 seek=220 count=4
-  dd if=rdb.fixed of=/dev/<disk>
-
-Bugs, Restrictions, Caveats
-===========================
-
-Quite a few things may not work as advertised. Not everything is
-tested, though several hundred MB have been read and written using
-this fs. For a most up-to-date list of bugs please consult
-fs/affs/Changes.
-
-By default, filenames are truncated to 30 characters without warning.
-'nofilenametruncate' mount option can change that behavior.
-
-Case is ignored by the affs in filename matching, but Linux shells
-do care about the case. Example (with /wb being an affs mounted fs):
-    rm /wb/WRONGCASE
-will remove /mnt/wrongcase, but
-    rm /wb/WR*
-will not since the names are matched by the shell.
-
-The block allocation is designed for hard disk partitions. If more
-than 1 process writes to a (small) diskette, the blocks are allocated
-in an ugly way (but the real AFFS doesn't do much better). This
-is also true when space gets tight.
-
-You cannot execute programs on an OFS (Old File System), since the
-program files cannot be memory mapped due to the 488 byte blocks.
-For the same reason you cannot mount an image on such a filesystem
-via the loopback device.
-
-The bitmap valid flag in the root block may not be accurate when the
-system crashes while an affs partition is mounted. There's currently
-no way to fix a garbled filesystem without an Amiga (disk validator)
-or manually (who would do this?). Maybe later.
-
-If you mount affs partitions on system startup, you may want to tell
-fsck that the fs should not be checked (place a '0' in the sixth field
-of /etc/fstab).
-
-It's not possible to read floppy disks with a normal PC or workstation
-due to an incompatibility with the Amiga floppy controller.
-
-If you are interested in an Amiga Emulator for Linux, look at
-
-http://web.archive.org/web/*/http://www.freiburg.linux.de/~uae/
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 14dc89c94822..273d802ad5fb 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -48,6 +48,7 @@ Documentation for filesystem implementations.
 
    9p
    adfs
+   affs
    autofs
    fuse
    overlayfs
-- 
cgit 


From ca6e9049a0934fe72ffea6990c889205aff0a2cf Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:50 +0100
Subject: docs: filesystems: convert afs.txt to ReST

- Add a SPDX header;
- Adjust document and section titles;
- Comment out text-only ToC;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/d77f5afdb5da0f8b0ec3dbe720aef23f1ce73bb5.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/afs.rst   | 251 +++++++++++++++++++++++++++++++++++
 Documentation/filesystems/afs.txt   | 258 ------------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 252 insertions(+), 258 deletions(-)
 create mode 100644 Documentation/filesystems/afs.rst
 delete mode 100644 Documentation/filesystems/afs.txt

diff --git a/Documentation/filesystems/afs.rst b/Documentation/filesystems/afs.rst
new file mode 100644
index 000000000000..c4ec39a5966e
--- /dev/null
+++ b/Documentation/filesystems/afs.rst
@@ -0,0 +1,251 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+kAFS: AFS FILESYSTEM
+====================
+
+.. Contents:
+
+ - Overview.
+ - Usage.
+ - Mountpoints.
+ - Dynamic root.
+ - Proc filesystem.
+ - The cell database.
+ - Security.
+ - The @sys substitution.
+
+
+Overview
+========
+
+This filesystem provides a fairly simple secure AFS filesystem driver. It is
+under development and does not yet provide the full feature set.  The features
+it does support include:
+
+ (*) Security (currently only AFS kaserver and KerberosIV tickets).
+
+ (*) File reading and writing.
+
+ (*) Automounting.
+
+ (*) Local caching (via fscache).
+
+It does not yet support the following AFS features:
+
+ (*) pioctl() system call.
+
+
+Compilation
+===========
+
+The filesystem should be enabled by turning on the kernel configuration
+options::
+
+	CONFIG_AF_RXRPC		- The RxRPC protocol transport
+	CONFIG_RXKAD		- The RxRPC Kerberos security handler
+	CONFIG_AFS		- The AFS filesystem
+
+Additionally, the following can be turned on to aid debugging::
+
+	CONFIG_AF_RXRPC_DEBUG	- Permit AF_RXRPC debugging to be enabled
+	CONFIG_AFS_DEBUG	- Permit AFS debugging to be enabled
+
+They permit the debugging messages to be turned on dynamically by manipulating
+the masks in the following files::
+
+	/sys/module/af_rxrpc/parameters/debug
+	/sys/module/kafs/parameters/debug
+
+
+Usage
+=====
+
+When inserting the driver modules the root cell must be specified along with a
+list of volume location server IP addresses::
+
+	modprobe rxrpc
+	modprobe kafs rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
+
+The first module is the AF_RXRPC network protocol driver.  This provides the
+RxRPC remote operation protocol and may also be accessed from userspace.  See:
+
+	Documentation/networking/rxrpc.txt
+
+The second module is the kerberos RxRPC security driver, and the third module
+is the actual filesystem driver for the AFS filesystem.
+
+Once the module has been loaded, more modules can be added by the following
+procedure::
+
+	echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
+
+Where the parameters to the "add" command are the name of a cell and a list of
+volume location servers within that cell, with the latter separated by colons.
+
+Filesystems can be mounted anywhere by commands similar to the following::
+
+	mount -t afs "%cambridge.redhat.com:root.afs." /afs
+	mount -t afs "#cambridge.redhat.com:root.cell." /afs/cambridge
+	mount -t afs "#root.afs." /afs
+	mount -t afs "#root.cell." /afs/cambridge
+
+Where the initial character is either a hash or a percent symbol depending on
+whether you definitely want a R/W volume (percent) or whether you'd prefer a
+R/O volume, but are willing to use a R/W volume instead (hash).
+
+The name of the volume can be suffixes with ".backup" or ".readonly" to
+specify connection to only volumes of those types.
+
+The name of the cell is optional, and if not given during a mount, then the
+named volume will be looked up in the cell specified during modprobe.
+
+Additional cells can be added through /proc (see later section).
+
+
+Mountpoints
+===========
+
+AFS has a concept of mountpoints. In AFS terms, these are specially formatted
+symbolic links (of the same form as the "device name" passed to mount).  kAFS
+presents these to the user as directories that have a follow-link capability
+(ie: symbolic link semantics).  If anyone attempts to access them, they will
+automatically cause the target volume to be mounted (if possible) on that site.
+
+Automatically mounted filesystems will be automatically unmounted approximately
+twenty minutes after they were last used.  Alternatively they can be unmounted
+directly with the umount() system call.
+
+Manually unmounting an AFS volume will cause any idle submounts upon it to be
+culled first.  If all are culled, then the requested volume will also be
+unmounted, otherwise error EBUSY will be returned.
+
+This can be used by the administrator to attempt to unmount the whole AFS tree
+mounted on /afs in one go by doing::
+
+	umount /afs
+
+
+Dynamic Root
+============
+
+A mount option is available to create a serverless mount that is only usable
+for dynamic lookup.  Creating such a mount can be done by, for example::
+
+	mount -t afs none /afs -o dyn
+
+This creates a mount that just has an empty directory at the root.  Attempting
+to look up a name in this directory will cause a mountpoint to be created that
+looks up a cell of the same name, for example::
+
+	ls /afs/grand.central.org/
+
+
+Proc Filesystem
+===============
+
+The AFS modules creates a "/proc/fs/afs/" directory and populates it:
+
+  (*) A "cells" file that lists cells currently known to the afs module and
+      their usage counts::
+
+	[root@andromeda ~]# cat /proc/fs/afs/cells
+	USE NAME
+	  3 cambridge.redhat.com
+
+  (*) A directory per cell that contains files that list volume location
+      servers, volumes, and active servers known within that cell::
+
+	[root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/servers
+	USE ADDR            STATE
+	  4 172.16.18.91        0
+	[root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/vlservers
+	ADDRESS
+	172.16.18.91
+	[root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/volumes
+	USE STT VLID[0]  VLID[1]  VLID[2]  NAME
+	  1 Val 20000000 20000001 20000002 root.afs
+
+
+The Cell Database
+=================
+
+The filesystem maintains an internal database of all the cells it knows and the
+IP addresses of the volume location servers for those cells.  The cell to which
+the system belongs is added to the database when modprobe is performed by the
+"rootcell=" argument or, if compiled in, using a "kafs.rootcell=" argument on
+the kernel command line.
+
+Further cells can be added by commands similar to the following::
+
+	echo add CELLNAME VLADDR[:VLADDR][:VLADDR]... >/proc/fs/afs/cells
+	echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
+
+No other cell database operations are available at this time.
+
+
+Security
+========
+
+Secure operations are initiated by acquiring a key using the klog program.  A
+very primitive klog program is available at:
+
+	http://people.redhat.com/~dhowells/rxrpc/klog.c
+
+This should be compiled by::
+
+	make klog LDLIBS="-lcrypto -lcrypt -lkrb4 -lkeyutils"
+
+And then run as::
+
+	./klog
+
+Assuming it's successful, this adds a key of type RxRPC, named for the service
+and cell, eg: "afs@<cellname>".  This can be viewed with the keyctl program or
+by cat'ing /proc/keys::
+
+	[root@andromeda ~]# keyctl show
+	Session Keyring
+	       -3 --alswrv      0     0  keyring: _ses.3268
+		2 --alswrv      0     0   \_ keyring: _uid.0
+	111416553 --als--v      0     0   \_ rxrpc: afs@CAMBRIDGE.REDHAT.COM
+
+Currently the username, realm, password and proposed ticket lifetime are
+compiled in to the program.
+
+It is not required to acquire a key before using AFS facilities, but if one is
+not acquired then all operations will be governed by the anonymous user parts
+of the ACLs.
+
+If a key is acquired, then all AFS operations, including mounts and automounts,
+made by a possessor of that key will be secured with that key.
+
+If a file is opened with a particular key and then the file descriptor is
+passed to a process that doesn't have that key (perhaps over an AF_UNIX
+socket), then the operations on the file will be made with key that was used to
+open the file.
+
+
+The @sys Substitution
+=====================
+
+The list of up to 16 @sys substitutions for the current network namespace can
+be configured by writing a list to /proc/fs/afs/sysname::
+
+	[root@andromeda ~]# echo foo amd64_linux_26 >/proc/fs/afs/sysname
+
+or cleared entirely by writing an empty list::
+
+	[root@andromeda ~]# echo >/proc/fs/afs/sysname
+
+The current list for current network namespace can be retrieved by::
+
+	[root@andromeda ~]# cat /proc/fs/afs/sysname
+	foo
+	amd64_linux_26
+
+When @sys is being substituted for, each element of the list is tried in the
+order given.
+
+By default, the list will contain one item that conforms to the pattern
+"<arch>_linux_26", amd64 being the name for x86_64.
diff --git a/Documentation/filesystems/afs.txt b/Documentation/filesystems/afs.txt
deleted file mode 100644
index 8c6ea7b41048..000000000000
--- a/Documentation/filesystems/afs.txt
+++ /dev/null
@@ -1,258 +0,0 @@
-			     ====================
-			     kAFS: AFS FILESYSTEM
-			     ====================
-
-Contents:
-
- - Overview.
- - Usage.
- - Mountpoints.
- - Dynamic root.
- - Proc filesystem.
- - The cell database.
- - Security.
- - The @sys substitution.
-
-
-========
-OVERVIEW
-========
-
-This filesystem provides a fairly simple secure AFS filesystem driver. It is
-under development and does not yet provide the full feature set.  The features
-it does support include:
-
- (*) Security (currently only AFS kaserver and KerberosIV tickets).
-
- (*) File reading and writing.
-
- (*) Automounting.
-
- (*) Local caching (via fscache).
-
-It does not yet support the following AFS features:
-
- (*) pioctl() system call.
-
-
-===========
-COMPILATION
-===========
-
-The filesystem should be enabled by turning on the kernel configuration
-options:
-
-	CONFIG_AF_RXRPC		- The RxRPC protocol transport
-	CONFIG_RXKAD		- The RxRPC Kerberos security handler
-	CONFIG_AFS		- The AFS filesystem
-
-Additionally, the following can be turned on to aid debugging:
-
-	CONFIG_AF_RXRPC_DEBUG	- Permit AF_RXRPC debugging to be enabled
-	CONFIG_AFS_DEBUG	- Permit AFS debugging to be enabled
-
-They permit the debugging messages to be turned on dynamically by manipulating
-the masks in the following files:
-
-	/sys/module/af_rxrpc/parameters/debug
-	/sys/module/kafs/parameters/debug
-
-
-=====
-USAGE
-=====
-
-When inserting the driver modules the root cell must be specified along with a
-list of volume location server IP addresses:
-
-	modprobe rxrpc
-	modprobe kafs rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
-
-The first module is the AF_RXRPC network protocol driver.  This provides the
-RxRPC remote operation protocol and may also be accessed from userspace.  See:
-
-	Documentation/networking/rxrpc.txt
-
-The second module is the kerberos RxRPC security driver, and the third module
-is the actual filesystem driver for the AFS filesystem.
-
-Once the module has been loaded, more modules can be added by the following
-procedure:
-
-	echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
-
-Where the parameters to the "add" command are the name of a cell and a list of
-volume location servers within that cell, with the latter separated by colons.
-
-Filesystems can be mounted anywhere by commands similar to the following:
-
-	mount -t afs "%cambridge.redhat.com:root.afs." /afs
-	mount -t afs "#cambridge.redhat.com:root.cell." /afs/cambridge
-	mount -t afs "#root.afs." /afs
-	mount -t afs "#root.cell." /afs/cambridge
-
-Where the initial character is either a hash or a percent symbol depending on
-whether you definitely want a R/W volume (percent) or whether you'd prefer a
-R/O volume, but are willing to use a R/W volume instead (hash).
-
-The name of the volume can be suffixes with ".backup" or ".readonly" to
-specify connection to only volumes of those types.
-
-The name of the cell is optional, and if not given during a mount, then the
-named volume will be looked up in the cell specified during modprobe.
-
-Additional cells can be added through /proc (see later section).
-
-
-===========
-MOUNTPOINTS
-===========
-
-AFS has a concept of mountpoints. In AFS terms, these are specially formatted
-symbolic links (of the same form as the "device name" passed to mount).  kAFS
-presents these to the user as directories that have a follow-link capability
-(ie: symbolic link semantics).  If anyone attempts to access them, they will
-automatically cause the target volume to be mounted (if possible) on that site.
-
-Automatically mounted filesystems will be automatically unmounted approximately
-twenty minutes after they were last used.  Alternatively they can be unmounted
-directly with the umount() system call.
-
-Manually unmounting an AFS volume will cause any idle submounts upon it to be
-culled first.  If all are culled, then the requested volume will also be
-unmounted, otherwise error EBUSY will be returned.
-
-This can be used by the administrator to attempt to unmount the whole AFS tree
-mounted on /afs in one go by doing:
-
-	umount /afs
-
-
-============
-DYNAMIC ROOT
-============
-
-A mount option is available to create a serverless mount that is only usable
-for dynamic lookup.  Creating such a mount can be done by, for example:
-
-	mount -t afs none /afs -o dyn
-
-This creates a mount that just has an empty directory at the root.  Attempting
-to look up a name in this directory will cause a mountpoint to be created that
-looks up a cell of the same name, for example:
-
-	ls /afs/grand.central.org/
-
-
-===============
-PROC FILESYSTEM
-===============
-
-The AFS modules creates a "/proc/fs/afs/" directory and populates it:
-
-  (*) A "cells" file that lists cells currently known to the afs module and
-      their usage counts:
-
-	[root@andromeda ~]# cat /proc/fs/afs/cells
-	USE NAME
-	  3 cambridge.redhat.com
-
-  (*) A directory per cell that contains files that list volume location
-      servers, volumes, and active servers known within that cell.
-
-	[root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/servers
-	USE ADDR            STATE
-	  4 172.16.18.91        0
-	[root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/vlservers
-	ADDRESS
-	172.16.18.91
-	[root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/volumes
-	USE STT VLID[0]  VLID[1]  VLID[2]  NAME
-	  1 Val 20000000 20000001 20000002 root.afs
-
-
-=================
-THE CELL DATABASE
-=================
-
-The filesystem maintains an internal database of all the cells it knows and the
-IP addresses of the volume location servers for those cells.  The cell to which
-the system belongs is added to the database when modprobe is performed by the
-"rootcell=" argument or, if compiled in, using a "kafs.rootcell=" argument on
-the kernel command line.
-
-Further cells can be added by commands similar to the following:
-
-	echo add CELLNAME VLADDR[:VLADDR][:VLADDR]... >/proc/fs/afs/cells
-	echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells
-
-No other cell database operations are available at this time.
-
-
-========
-SECURITY
-========
-
-Secure operations are initiated by acquiring a key using the klog program.  A
-very primitive klog program is available at:
-
-	http://people.redhat.com/~dhowells/rxrpc/klog.c
-
-This should be compiled by:
-
-	make klog LDLIBS="-lcrypto -lcrypt -lkrb4 -lkeyutils"
-
-And then run as:
-
-	./klog
-
-Assuming it's successful, this adds a key of type RxRPC, named for the service
-and cell, eg: "afs@<cellname>".  This can be viewed with the keyctl program or
-by cat'ing /proc/keys:
-
-	[root@andromeda ~]# keyctl show
-	Session Keyring
-	       -3 --alswrv      0     0  keyring: _ses.3268
-		2 --alswrv      0     0   \_ keyring: _uid.0
-	111416553 --als--v      0     0   \_ rxrpc: afs@CAMBRIDGE.REDHAT.COM
-
-Currently the username, realm, password and proposed ticket lifetime are
-compiled in to the program.
-
-It is not required to acquire a key before using AFS facilities, but if one is
-not acquired then all operations will be governed by the anonymous user parts
-of the ACLs.
-
-If a key is acquired, then all AFS operations, including mounts and automounts,
-made by a possessor of that key will be secured with that key.
-
-If a file is opened with a particular key and then the file descriptor is
-passed to a process that doesn't have that key (perhaps over an AF_UNIX
-socket), then the operations on the file will be made with key that was used to
-open the file.
-
-
-=====================
-THE @SYS SUBSTITUTION
-=====================
-
-The list of up to 16 @sys substitutions for the current network namespace can
-be configured by writing a list to /proc/fs/afs/sysname:
-
-	[root@andromeda ~]# echo foo amd64_linux_26 >/proc/fs/afs/sysname
-
-or cleared entirely by writing an empty list:
-
-	[root@andromeda ~]# echo >/proc/fs/afs/sysname
-
-The current list for current network namespace can be retrieved by:
-
-	[root@andromeda ~]# cat /proc/fs/afs/sysname
-	foo
-	amd64_linux_26
-
-When @sys is being substituted for, each element of the list is tried in the
-order given.
-
-By default, the list will contain one item that conforms to the pattern
-"<arch>_linux_26", amd64 being the name for x86_64.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 273d802ad5fb..0598bc52abdc 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -49,6 +49,7 @@ Documentation for filesystem implementations.
    9p
    adfs
    affs
+   afs
    autofs
    fuse
    overlayfs
-- 
cgit 


From c64d3dc69f38a08d082813f1c0425d7a108ef950 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:51 +0100
Subject: docs: filesystems: convert autofs-mount-control.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/8cae057ae244d0f5b58d3c209bcdae5ed82bc52c.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/autofs-mount-control.rst | 410 +++++++++++++++++++++
 Documentation/filesystems/autofs-mount-control.txt | 408 --------------------
 Documentation/filesystems/index.rst                |   1 +
 3 files changed, 411 insertions(+), 408 deletions(-)
 create mode 100644 Documentation/filesystems/autofs-mount-control.rst
 delete mode 100644 Documentation/filesystems/autofs-mount-control.txt

diff --git a/Documentation/filesystems/autofs-mount-control.rst b/Documentation/filesystems/autofs-mount-control.rst
new file mode 100644
index 000000000000..2903aed92316
--- /dev/null
+++ b/Documentation/filesystems/autofs-mount-control.rst
@@ -0,0 +1,410 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================================================================
+Miscellaneous Device control operations for the autofs kernel module
+====================================================================
+
+The problem
+===========
+
+There is a problem with active restarts in autofs (that is to say
+restarting autofs when there are busy mounts).
+
+During normal operation autofs uses a file descriptor opened on the
+directory that is being managed in order to be able to issue control
+operations. Using a file descriptor gives ioctl operations access to
+autofs specific information stored in the super block. The operations
+are things such as setting an autofs mount catatonic, setting the
+expire timeout and requesting expire checks. As is explained below,
+certain types of autofs triggered mounts can end up covering an autofs
+mount itself which prevents us being able to use open(2) to obtain a
+file descriptor for these operations if we don't already have one open.
+
+Currently autofs uses "umount -l" (lazy umount) to clear active mounts
+at restart. While using lazy umount works for most cases, anything that
+needs to walk back up the mount tree to construct a path, such as
+getcwd(2) and the proc file system /proc/<pid>/cwd, no longer works
+because the point from which the path is constructed has been detached
+from the mount tree.
+
+The actual problem with autofs is that it can't reconnect to existing
+mounts. Immediately one thinks of just adding the ability to remount
+autofs file systems would solve it, but alas, that can't work. This is
+because autofs direct mounts and the implementation of "on demand mount
+and expire" of nested mount trees have the file system mounted directly
+on top of the mount trigger directory dentry.
+
+For example, there are two types of automount maps, direct (in the kernel
+module source you will see a third type called an offset, which is just
+a direct mount in disguise) and indirect.
+
+Here is a master map with direct and indirect map entries::
+
+    /-      /etc/auto.direct
+    /test   /etc/auto.indirect
+
+and the corresponding map files::
+
+    /etc/auto.direct:
+
+    /automount/dparse/g6  budgie:/autofs/export1
+    /automount/dparse/g1  shark:/autofs/export1
+    and so on.
+
+/etc/auto.indirect::
+
+    g1    shark:/autofs/export1
+    g6    budgie:/autofs/export1
+    and so on.
+
+For the above indirect map an autofs file system is mounted on /test and
+mounts are triggered for each sub-directory key by the inode lookup
+operation. So we see a mount of shark:/autofs/export1 on /test/g1, for
+example.
+
+The way that direct mounts are handled is by making an autofs mount on
+each full path, such as /automount/dparse/g1, and using it as a mount
+trigger. So when we walk on the path we mount shark:/autofs/export1 "on
+top of this mount point". Since these are always directories we can
+use the follow_link inode operation to trigger the mount.
+
+But, each entry in direct and indirect maps can have offsets (making
+them multi-mount map entries).
+
+For example, an indirect mount map entry could also be::
+
+    g1  \
+    /        shark:/autofs/export5/testing/test \
+    /s1      shark:/autofs/export/testing/test/s1 \
+    /s2      shark:/autofs/export5/testing/test/s2 \
+    /s1/ss1  shark:/autofs/export1 \
+    /s2/ss2  shark:/autofs/export2
+
+and a similarly a direct mount map entry could also be::
+
+    /automount/dparse/g1 \
+	/       shark:/autofs/export5/testing/test \
+	/s1     shark:/autofs/export/testing/test/s1 \
+	/s2     shark:/autofs/export5/testing/test/s2 \
+	/s1/ss1 shark:/autofs/export2 \
+	/s2/ss2 shark:/autofs/export2
+
+One of the issues with version 4 of autofs was that, when mounting an
+entry with a large number of offsets, possibly with nesting, we needed
+to mount and umount all of the offsets as a single unit. Not really a
+problem, except for people with a large number of offsets in map entries.
+This mechanism is used for the well known "hosts" map and we have seen
+cases (in 2.4) where the available number of mounts are exhausted or
+where the number of privileged ports available is exhausted.
+
+In version 5 we mount only as we go down the tree of offsets and
+similarly for expiring them which resolves the above problem. There is
+somewhat more detail to the implementation but it isn't needed for the
+sake of the problem explanation. The one important detail is that these
+offsets are implemented using the same mechanism as the direct mounts
+above and so the mount points can be covered by a mount.
+
+The current autofs implementation uses an ioctl file descriptor opened
+on the mount point for control operations. The references held by the
+descriptor are accounted for in checks made to determine if a mount is
+in use and is also used to access autofs file system information held
+in the mount super block. So the use of a file handle needs to be
+retained.
+
+
+The Solution
+============
+
+To be able to restart autofs leaving existing direct, indirect and
+offset mounts in place we need to be able to obtain a file handle
+for these potentially covered autofs mount points. Rather than just
+implement an isolated operation it was decided to re-implement the
+existing ioctl interface and add new operations to provide this
+functionality.
+
+In addition, to be able to reconstruct a mount tree that has busy mounts,
+the uid and gid of the last user that triggered the mount needs to be
+available because these can be used as macro substitution variables in
+autofs maps. They are recorded at mount request time and an operation
+has been added to retrieve them.
+
+Since we're re-implementing the control interface, a couple of other
+problems with the existing interface have been addressed. First, when
+a mount or expire operation completes a status is returned to the
+kernel by either a "send ready" or a "send fail" operation. The
+"send fail" operation of the ioctl interface could only ever send
+ENOENT so the re-implementation allows user space to send an actual
+status. Another expensive operation in user space, for those using
+very large maps, is discovering if a mount is present. Usually this
+involves scanning /proc/mounts and since it needs to be done quite
+often it can introduce significant overhead when there are many entries
+in the mount table. An operation to lookup the mount status of a mount
+point dentry (covered or not) has also been added.
+
+Current kernel development policy recommends avoiding the use of the
+ioctl mechanism in favor of systems such as Netlink. An implementation
+using this system was attempted to evaluate its suitability and it was
+found to be inadequate, in this case. The Generic Netlink system was
+used for this as raw Netlink would lead to a significant increase in
+complexity. There's no question that the Generic Netlink system is an
+elegant solution for common case ioctl functions but it's not a complete
+replacement probably because its primary purpose in life is to be a
+message bus implementation rather than specifically an ioctl replacement.
+While it would be possible to work around this there is one concern
+that lead to the decision to not use it. This is that the autofs
+expire in the daemon has become far to complex because umount
+candidates are enumerated, almost for no other reason than to "count"
+the number of times to call the expire ioctl. This involves scanning
+the mount table which has proved to be a big overhead for users with
+large maps. The best way to improve this is try and get back to the
+way the expire was done long ago. That is, when an expire request is
+issued for a mount (file handle) we should continually call back to
+the daemon until we can't umount any more mounts, then return the
+appropriate status to the daemon. At the moment we just expire one
+mount at a time. A Generic Netlink implementation would exclude this
+possibility for future development due to the requirements of the
+message bus architecture.
+
+
+autofs Miscellaneous Device mount control interface
+====================================================
+
+The control interface is opening a device node, typically /dev/autofs.
+
+All the ioctls use a common structure to pass the needed parameter
+information and return operation results::
+
+    struct autofs_dev_ioctl {
+	    __u32 ver_major;
+	    __u32 ver_minor;
+	    __u32 size;             /* total size of data passed in
+				    * including this struct */
+	    __s32 ioctlfd;          /* automount command fd */
+
+	    /* Command parameters */
+	    union {
+		    struct args_protover		protover;
+		    struct args_protosubver		protosubver;
+		    struct args_openmount		openmount;
+		    struct args_ready		ready;
+		    struct args_fail		fail;
+		    struct args_setpipefd		setpipefd;
+		    struct args_timeout		timeout;
+		    struct args_requester		requester;
+		    struct args_expire		expire;
+		    struct args_askumount		askumount;
+		    struct args_ismountpoint	ismountpoint;
+	    };
+
+	    char path[0];
+    };
+
+The ioctlfd field is a mount point file descriptor of an autofs mount
+point. It is returned by the open call and is used by all calls except
+the check for whether a given path is a mount point, where it may
+optionally be used to check a specific mount corresponding to a given
+mount point file descriptor, and when requesting the uid and gid of the
+last successful mount on a directory within the autofs file system.
+
+The union is used to communicate parameters and results of calls made
+as described below.
+
+The path field is used to pass a path where it is needed and the size field
+is used account for the increased structure length when translating the
+structure sent from user space.
+
+This structure can be initialized before setting specific fields by using
+the void function call init_autofs_dev_ioctl(``struct autofs_dev_ioctl *``).
+
+All of the ioctls perform a copy of this structure from user space to
+kernel space and return -EINVAL if the size parameter is smaller than
+the structure size itself, -ENOMEM if the kernel memory allocation fails
+or -EFAULT if the copy itself fails. Other checks include a version check
+of the compiled in user space version against the module version and a
+mismatch results in a -EINVAL return. If the size field is greater than
+the structure size then a path is assumed to be present and is checked to
+ensure it begins with a "/" and is NULL terminated, otherwise -EINVAL is
+returned. Following these checks, for all ioctl commands except
+AUTOFS_DEV_IOCTL_VERSION_CMD, AUTOFS_DEV_IOCTL_OPENMOUNT_CMD and
+AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD the ioctlfd is validated and if it is
+not a valid descriptor or doesn't correspond to an autofs mount point
+an error of -EBADF, -ENOTTY or -EINVAL (not an autofs descriptor) is
+returned.
+
+
+The ioctls
+==========
+
+An example of an implementation which uses this interface can be seen
+in autofs version 5.0.4 and later in file lib/dev-ioctl-lib.c of the
+distribution tar available for download from kernel.org in directory
+/pub/linux/daemons/autofs/v5.
+
+The device node ioctl operations implemented by this interface are:
+
+
+AUTOFS_DEV_IOCTL_VERSION
+------------------------
+
+Get the major and minor version of the autofs device ioctl kernel module
+implementation. It requires an initialized struct autofs_dev_ioctl as an
+input parameter and sets the version information in the passed in structure.
+It returns 0 on success or the error -EINVAL if a version mismatch is
+detected.
+
+
+AUTOFS_DEV_IOCTL_PROTOVER_CMD and AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD
+------------------------------------------------------------------
+
+Get the major and minor version of the autofs protocol version understood
+by loaded module. This call requires an initialized struct autofs_dev_ioctl
+with the ioctlfd field set to a valid autofs mount point descriptor
+and sets the requested version number in version field of struct args_protover
+or sub_version field of struct args_protosubver. These commands return
+0 on success or one of the negative error codes if validation fails.
+
+
+AUTOFS_DEV_IOCTL_OPENMOUNT and AUTOFS_DEV_IOCTL_CLOSEMOUNT
+----------------------------------------------------------
+
+Obtain and release a file descriptor for an autofs managed mount point
+path. The open call requires an initialized struct autofs_dev_ioctl with
+the path field set and the size field adjusted appropriately as well
+as the devid field of struct args_openmount set to the device number of
+the autofs mount. The device number can be obtained from the mount options
+shown in /proc/mounts. The close call requires an initialized struct
+autofs_dev_ioct with the ioctlfd field set to the descriptor obtained
+from the open call. The release of the file descriptor can also be done
+with close(2) so any open descriptors will also be closed at process exit.
+The close call is included in the implemented operations largely for
+completeness and to provide for a consistent user space implementation.
+
+
+AUTOFS_DEV_IOCTL_READY_CMD and AUTOFS_DEV_IOCTL_FAIL_CMD
+--------------------------------------------------------
+
+Return mount and expire result status from user space to the kernel.
+Both of these calls require an initialized struct autofs_dev_ioctl
+with the ioctlfd field set to the descriptor obtained from the open
+call and the token field of struct args_ready or struct args_fail set
+to the wait queue token number, received by user space in the foregoing
+mount or expire request. The status field of struct args_fail is set to
+the errno of the operation. It is set to 0 on success.
+
+
+AUTOFS_DEV_IOCTL_SETPIPEFD_CMD
+------------------------------
+
+Set the pipe file descriptor used for kernel communication to the daemon.
+Normally this is set at mount time using an option but when reconnecting
+to a existing mount we need to use this to tell the autofs mount about
+the new kernel pipe descriptor. In order to protect mounts against
+incorrectly setting the pipe descriptor we also require that the autofs
+mount be catatonic (see next call).
+
+The call requires an initialized struct autofs_dev_ioctl with the
+ioctlfd field set to the descriptor obtained from the open call and
+the pipefd field of struct args_setpipefd set to descriptor of the pipe.
+On success the call also sets the process group id used to identify the
+controlling process (eg. the owning automount(8) daemon) to the process
+group of the caller.
+
+
+AUTOFS_DEV_IOCTL_CATATONIC_CMD
+------------------------------
+
+Make the autofs mount point catatonic. The autofs mount will no longer
+issue mount requests, the kernel communication pipe descriptor is released
+and any remaining waits in the queue released.
+
+The call requires an initialized struct autofs_dev_ioctl with the
+ioctlfd field set to the descriptor obtained from the open call.
+
+
+AUTOFS_DEV_IOCTL_TIMEOUT_CMD
+----------------------------
+
+Set the expire timeout for mounts within an autofs mount point.
+
+The call requires an initialized struct autofs_dev_ioctl with the
+ioctlfd field set to the descriptor obtained from the open call.
+
+
+AUTOFS_DEV_IOCTL_REQUESTER_CMD
+------------------------------
+
+Return the uid and gid of the last process to successfully trigger a the
+mount on the given path dentry.
+
+The call requires an initialized struct autofs_dev_ioctl with the path
+field set to the mount point in question and the size field adjusted
+appropriately. Upon return the uid field of struct args_requester contains
+the uid and gid field the gid.
+
+When reconstructing an autofs mount tree with active mounts we need to
+re-connect to mounts that may have used the original process uid and
+gid (or string variations of them) for mount lookups within the map entry.
+This call provides the ability to obtain this uid and gid so they may be
+used by user space for the mount map lookups.
+
+
+AUTOFS_DEV_IOCTL_EXPIRE_CMD
+---------------------------
+
+Issue an expire request to the kernel for an autofs mount. Typically
+this ioctl is called until no further expire candidates are found.
+
+The call requires an initialized struct autofs_dev_ioctl with the
+ioctlfd field set to the descriptor obtained from the open call. In
+addition an immediate expire that's independent of the mount timeout,
+and a forced expire that's independent of whether the mount is busy,
+can be requested by setting the how field of struct args_expire to
+AUTOFS_EXP_IMMEDIATE or AUTOFS_EXP_FORCED, respectively . If no
+expire candidates can be found the ioctl returns -1 with errno set to
+EAGAIN.
+
+This call causes the kernel module to check the mount corresponding
+to the given ioctlfd for mounts that can be expired, issues an expire
+request back to the daemon and waits for completion.
+
+AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD
+------------------------------
+
+Checks if an autofs mount point is in use.
+
+The call requires an initialized struct autofs_dev_ioctl with the
+ioctlfd field set to the descriptor obtained from the open call and
+it returns the result in the may_umount field of struct args_askumount,
+1 for busy and 0 otherwise.
+
+
+AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD
+---------------------------------
+
+Check if the given path is a mountpoint.
+
+The call requires an initialized struct autofs_dev_ioctl. There are two
+possible variations. Both use the path field set to the path of the mount
+point to check and the size field adjusted appropriately. One uses the
+ioctlfd field to identify a specific mount point to check while the other
+variation uses the path and optionally in.type field of struct args_ismountpoint
+set to an autofs mount type. The call returns 1 if this is a mount point
+and sets out.devid field to the device number of the mount and out.magic
+field to the relevant super block magic number (described below) or 0 if
+it isn't a mountpoint. In both cases the the device number (as returned
+by new_encode_dev()) is returned in out.devid field.
+
+If supplied with a file descriptor we're looking for a specific mount,
+not necessarily at the top of the mounted stack. In this case the path
+the descriptor corresponds to is considered a mountpoint if it is itself
+a mountpoint or contains a mount, such as a multi-mount without a root
+mount. In this case we return 1 if the descriptor corresponds to a mount
+point and and also returns the super magic of the covering mount if there
+is one or 0 if it isn't a mountpoint.
+
+If a path is supplied (and the ioctlfd field is set to -1) then the path
+is looked up and is checked to see if it is the root of a mount. If a
+type is also given we are looking for a particular autofs mount and if
+a match isn't found a fail is returned. If the the located path is the
+root of a mount 1 is returned along with the super magic of the mount
+or 0 otherwise.
diff --git a/Documentation/filesystems/autofs-mount-control.txt b/Documentation/filesystems/autofs-mount-control.txt
deleted file mode 100644
index acc02fc57993..000000000000
--- a/Documentation/filesystems/autofs-mount-control.txt
+++ /dev/null
@@ -1,408 +0,0 @@
-
-Miscellaneous Device control operations for the autofs kernel module
-====================================================================
-
-The problem
-===========
-
-There is a problem with active restarts in autofs (that is to say
-restarting autofs when there are busy mounts).
-
-During normal operation autofs uses a file descriptor opened on the
-directory that is being managed in order to be able to issue control
-operations. Using a file descriptor gives ioctl operations access to
-autofs specific information stored in the super block. The operations
-are things such as setting an autofs mount catatonic, setting the
-expire timeout and requesting expire checks. As is explained below,
-certain types of autofs triggered mounts can end up covering an autofs
-mount itself which prevents us being able to use open(2) to obtain a
-file descriptor for these operations if we don't already have one open.
-
-Currently autofs uses "umount -l" (lazy umount) to clear active mounts
-at restart. While using lazy umount works for most cases, anything that
-needs to walk back up the mount tree to construct a path, such as
-getcwd(2) and the proc file system /proc/<pid>/cwd, no longer works
-because the point from which the path is constructed has been detached
-from the mount tree.
-
-The actual problem with autofs is that it can't reconnect to existing
-mounts. Immediately one thinks of just adding the ability to remount
-autofs file systems would solve it, but alas, that can't work. This is
-because autofs direct mounts and the implementation of "on demand mount
-and expire" of nested mount trees have the file system mounted directly
-on top of the mount trigger directory dentry.
-
-For example, there are two types of automount maps, direct (in the kernel
-module source you will see a third type called an offset, which is just
-a direct mount in disguise) and indirect.
-
-Here is a master map with direct and indirect map entries:
-
-/-      /etc/auto.direct
-/test   /etc/auto.indirect
-
-and the corresponding map files:
-
-/etc/auto.direct:
-
-/automount/dparse/g6  budgie:/autofs/export1
-/automount/dparse/g1  shark:/autofs/export1
-and so on.
-
-/etc/auto.indirect:
-
-g1    shark:/autofs/export1
-g6    budgie:/autofs/export1
-and so on.
-
-For the above indirect map an autofs file system is mounted on /test and
-mounts are triggered for each sub-directory key by the inode lookup
-operation. So we see a mount of shark:/autofs/export1 on /test/g1, for
-example.
-
-The way that direct mounts are handled is by making an autofs mount on
-each full path, such as /automount/dparse/g1, and using it as a mount
-trigger. So when we walk on the path we mount shark:/autofs/export1 "on
-top of this mount point". Since these are always directories we can
-use the follow_link inode operation to trigger the mount.
-
-But, each entry in direct and indirect maps can have offsets (making
-them multi-mount map entries).
-
-For example, an indirect mount map entry could also be:
-
-g1  \
-   /        shark:/autofs/export5/testing/test \
-   /s1      shark:/autofs/export/testing/test/s1 \
-   /s2      shark:/autofs/export5/testing/test/s2 \
-   /s1/ss1  shark:/autofs/export1 \
-   /s2/ss2  shark:/autofs/export2
-
-and a similarly a direct mount map entry could also be:
-
-/automount/dparse/g1 \
-    /       shark:/autofs/export5/testing/test \
-    /s1     shark:/autofs/export/testing/test/s1 \
-    /s2     shark:/autofs/export5/testing/test/s2 \
-    /s1/ss1 shark:/autofs/export2 \
-    /s2/ss2 shark:/autofs/export2
-
-One of the issues with version 4 of autofs was that, when mounting an
-entry with a large number of offsets, possibly with nesting, we needed
-to mount and umount all of the offsets as a single unit. Not really a
-problem, except for people with a large number of offsets in map entries.
-This mechanism is used for the well known "hosts" map and we have seen
-cases (in 2.4) where the available number of mounts are exhausted or
-where the number of privileged ports available is exhausted.
-
-In version 5 we mount only as we go down the tree of offsets and
-similarly for expiring them which resolves the above problem. There is
-somewhat more detail to the implementation but it isn't needed for the
-sake of the problem explanation. The one important detail is that these
-offsets are implemented using the same mechanism as the direct mounts
-above and so the mount points can be covered by a mount.
-
-The current autofs implementation uses an ioctl file descriptor opened
-on the mount point for control operations. The references held by the
-descriptor are accounted for in checks made to determine if a mount is
-in use and is also used to access autofs file system information held
-in the mount super block. So the use of a file handle needs to be
-retained.
-
-
-The Solution
-============
-
-To be able to restart autofs leaving existing direct, indirect and
-offset mounts in place we need to be able to obtain a file handle
-for these potentially covered autofs mount points. Rather than just
-implement an isolated operation it was decided to re-implement the
-existing ioctl interface and add new operations to provide this
-functionality.
-
-In addition, to be able to reconstruct a mount tree that has busy mounts,
-the uid and gid of the last user that triggered the mount needs to be
-available because these can be used as macro substitution variables in
-autofs maps. They are recorded at mount request time and an operation
-has been added to retrieve them.
-
-Since we're re-implementing the control interface, a couple of other
-problems with the existing interface have been addressed. First, when
-a mount or expire operation completes a status is returned to the
-kernel by either a "send ready" or a "send fail" operation. The
-"send fail" operation of the ioctl interface could only ever send
-ENOENT so the re-implementation allows user space to send an actual
-status. Another expensive operation in user space, for those using
-very large maps, is discovering if a mount is present. Usually this
-involves scanning /proc/mounts and since it needs to be done quite
-often it can introduce significant overhead when there are many entries
-in the mount table. An operation to lookup the mount status of a mount
-point dentry (covered or not) has also been added.
-
-Current kernel development policy recommends avoiding the use of the
-ioctl mechanism in favor of systems such as Netlink. An implementation
-using this system was attempted to evaluate its suitability and it was
-found to be inadequate, in this case. The Generic Netlink system was
-used for this as raw Netlink would lead to a significant increase in
-complexity. There's no question that the Generic Netlink system is an
-elegant solution for common case ioctl functions but it's not a complete
-replacement probably because its primary purpose in life is to be a
-message bus implementation rather than specifically an ioctl replacement.
-While it would be possible to work around this there is one concern
-that lead to the decision to not use it. This is that the autofs
-expire in the daemon has become far to complex because umount
-candidates are enumerated, almost for no other reason than to "count"
-the number of times to call the expire ioctl. This involves scanning
-the mount table which has proved to be a big overhead for users with
-large maps. The best way to improve this is try and get back to the
-way the expire was done long ago. That is, when an expire request is
-issued for a mount (file handle) we should continually call back to
-the daemon until we can't umount any more mounts, then return the
-appropriate status to the daemon. At the moment we just expire one
-mount at a time. A Generic Netlink implementation would exclude this
-possibility for future development due to the requirements of the
-message bus architecture.
-
-
-autofs Miscellaneous Device mount control interface
-====================================================
-
-The control interface is opening a device node, typically /dev/autofs.
-
-All the ioctls use a common structure to pass the needed parameter
-information and return operation results:
-
-struct autofs_dev_ioctl {
-	__u32 ver_major;
-	__u32 ver_minor;
-	__u32 size;             /* total size of data passed in
-				 * including this struct */
-	__s32 ioctlfd;          /* automount command fd */
-
-	/* Command parameters */
-	union {
-		struct args_protover		protover;
-		struct args_protosubver		protosubver;
-		struct args_openmount		openmount;
-		struct args_ready		ready;
-		struct args_fail		fail;
-		struct args_setpipefd		setpipefd;
-		struct args_timeout		timeout;
-		struct args_requester		requester;
-		struct args_expire		expire;
-		struct args_askumount		askumount;
-		struct args_ismountpoint	ismountpoint;
-	};
-
-	char path[0];
-};
-
-The ioctlfd field is a mount point file descriptor of an autofs mount
-point. It is returned by the open call and is used by all calls except
-the check for whether a given path is a mount point, where it may
-optionally be used to check a specific mount corresponding to a given
-mount point file descriptor, and when requesting the uid and gid of the
-last successful mount on a directory within the autofs file system.
-
-The union is used to communicate parameters and results of calls made
-as described below.
-
-The path field is used to pass a path where it is needed and the size field
-is used account for the increased structure length when translating the
-structure sent from user space.
-
-This structure can be initialized before setting specific fields by using
-the void function call init_autofs_dev_ioctl(struct autofs_dev_ioctl *).
-
-All of the ioctls perform a copy of this structure from user space to
-kernel space and return -EINVAL if the size parameter is smaller than
-the structure size itself, -ENOMEM if the kernel memory allocation fails
-or -EFAULT if the copy itself fails. Other checks include a version check
-of the compiled in user space version against the module version and a
-mismatch results in a -EINVAL return. If the size field is greater than
-the structure size then a path is assumed to be present and is checked to
-ensure it begins with a "/" and is NULL terminated, otherwise -EINVAL is
-returned. Following these checks, for all ioctl commands except
-AUTOFS_DEV_IOCTL_VERSION_CMD, AUTOFS_DEV_IOCTL_OPENMOUNT_CMD and
-AUTOFS_DEV_IOCTL_CLOSEMOUNT_CMD the ioctlfd is validated and if it is
-not a valid descriptor or doesn't correspond to an autofs mount point
-an error of -EBADF, -ENOTTY or -EINVAL (not an autofs descriptor) is
-returned.
-
-
-The ioctls
-==========
-
-An example of an implementation which uses this interface can be seen
-in autofs version 5.0.4 and later in file lib/dev-ioctl-lib.c of the
-distribution tar available for download from kernel.org in directory
-/pub/linux/daemons/autofs/v5.
-
-The device node ioctl operations implemented by this interface are:
-
-
-AUTOFS_DEV_IOCTL_VERSION
-------------------------
-
-Get the major and minor version of the autofs device ioctl kernel module
-implementation. It requires an initialized struct autofs_dev_ioctl as an
-input parameter and sets the version information in the passed in structure.
-It returns 0 on success or the error -EINVAL if a version mismatch is
-detected.
-
-
-AUTOFS_DEV_IOCTL_PROTOVER_CMD and AUTOFS_DEV_IOCTL_PROTOSUBVER_CMD
-------------------------------------------------------------------
-
-Get the major and minor version of the autofs protocol version understood
-by loaded module. This call requires an initialized struct autofs_dev_ioctl
-with the ioctlfd field set to a valid autofs mount point descriptor
-and sets the requested version number in version field of struct args_protover
-or sub_version field of struct args_protosubver. These commands return
-0 on success or one of the negative error codes if validation fails.
-
-
-AUTOFS_DEV_IOCTL_OPENMOUNT and AUTOFS_DEV_IOCTL_CLOSEMOUNT
-----------------------------------------------------------
-
-Obtain and release a file descriptor for an autofs managed mount point
-path. The open call requires an initialized struct autofs_dev_ioctl with
-the path field set and the size field adjusted appropriately as well
-as the devid field of struct args_openmount set to the device number of
-the autofs mount. The device number can be obtained from the mount options
-shown in /proc/mounts. The close call requires an initialized struct
-autofs_dev_ioct with the ioctlfd field set to the descriptor obtained
-from the open call. The release of the file descriptor can also be done
-with close(2) so any open descriptors will also be closed at process exit.
-The close call is included in the implemented operations largely for
-completeness and to provide for a consistent user space implementation.
-
-
-AUTOFS_DEV_IOCTL_READY_CMD and AUTOFS_DEV_IOCTL_FAIL_CMD
---------------------------------------------------------
-
-Return mount and expire result status from user space to the kernel.
-Both of these calls require an initialized struct autofs_dev_ioctl
-with the ioctlfd field set to the descriptor obtained from the open
-call and the token field of struct args_ready or struct args_fail set
-to the wait queue token number, received by user space in the foregoing
-mount or expire request. The status field of struct args_fail is set to
-the errno of the operation. It is set to 0 on success.
-
-
-AUTOFS_DEV_IOCTL_SETPIPEFD_CMD
-------------------------------
-
-Set the pipe file descriptor used for kernel communication to the daemon.
-Normally this is set at mount time using an option but when reconnecting
-to a existing mount we need to use this to tell the autofs mount about
-the new kernel pipe descriptor. In order to protect mounts against
-incorrectly setting the pipe descriptor we also require that the autofs
-mount be catatonic (see next call).
-
-The call requires an initialized struct autofs_dev_ioctl with the
-ioctlfd field set to the descriptor obtained from the open call and
-the pipefd field of struct args_setpipefd set to descriptor of the pipe.
-On success the call also sets the process group id used to identify the
-controlling process (eg. the owning automount(8) daemon) to the process
-group of the caller.
-
-
-AUTOFS_DEV_IOCTL_CATATONIC_CMD
-------------------------------
-
-Make the autofs mount point catatonic. The autofs mount will no longer
-issue mount requests, the kernel communication pipe descriptor is released
-and any remaining waits in the queue released.
-
-The call requires an initialized struct autofs_dev_ioctl with the
-ioctlfd field set to the descriptor obtained from the open call.
-
-
-AUTOFS_DEV_IOCTL_TIMEOUT_CMD
-----------------------------
-
-Set the expire timeout for mounts within an autofs mount point.
-
-The call requires an initialized struct autofs_dev_ioctl with the
-ioctlfd field set to the descriptor obtained from the open call.
-
-
-AUTOFS_DEV_IOCTL_REQUESTER_CMD
-------------------------------
-
-Return the uid and gid of the last process to successfully trigger a the
-mount on the given path dentry.
-
-The call requires an initialized struct autofs_dev_ioctl with the path
-field set to the mount point in question and the size field adjusted
-appropriately. Upon return the uid field of struct args_requester contains
-the uid and gid field the gid.
-
-When reconstructing an autofs mount tree with active mounts we need to
-re-connect to mounts that may have used the original process uid and
-gid (or string variations of them) for mount lookups within the map entry.
-This call provides the ability to obtain this uid and gid so they may be
-used by user space for the mount map lookups.
-
-
-AUTOFS_DEV_IOCTL_EXPIRE_CMD
----------------------------
-
-Issue an expire request to the kernel for an autofs mount. Typically
-this ioctl is called until no further expire candidates are found.
-
-The call requires an initialized struct autofs_dev_ioctl with the
-ioctlfd field set to the descriptor obtained from the open call. In
-addition an immediate expire that's independent of the mount timeout,
-and a forced expire that's independent of whether the mount is busy,
-can be requested by setting the how field of struct args_expire to
-AUTOFS_EXP_IMMEDIATE or AUTOFS_EXP_FORCED, respectively . If no
-expire candidates can be found the ioctl returns -1 with errno set to
-EAGAIN.
-
-This call causes the kernel module to check the mount corresponding
-to the given ioctlfd for mounts that can be expired, issues an expire
-request back to the daemon and waits for completion.
-
-AUTOFS_DEV_IOCTL_ASKUMOUNT_CMD
-------------------------------
-
-Checks if an autofs mount point is in use.
-
-The call requires an initialized struct autofs_dev_ioctl with the
-ioctlfd field set to the descriptor obtained from the open call and
-it returns the result in the may_umount field of struct args_askumount,
-1 for busy and 0 otherwise.
-
-
-AUTOFS_DEV_IOCTL_ISMOUNTPOINT_CMD
----------------------------------
-
-Check if the given path is a mountpoint.
-
-The call requires an initialized struct autofs_dev_ioctl. There are two
-possible variations. Both use the path field set to the path of the mount
-point to check and the size field adjusted appropriately. One uses the
-ioctlfd field to identify a specific mount point to check while the other
-variation uses the path and optionally in.type field of struct args_ismountpoint
-set to an autofs mount type. The call returns 1 if this is a mount point
-and sets out.devid field to the device number of the mount and out.magic
-field to the relevant super block magic number (described below) or 0 if
-it isn't a mountpoint. In both cases the the device number (as returned
-by new_encode_dev()) is returned in out.devid field.
-
-If supplied with a file descriptor we're looking for a specific mount,
-not necessarily at the top of the mounted stack. In this case the path
-the descriptor corresponds to is considered a mountpoint if it is itself
-a mountpoint or contains a mount, such as a multi-mount without a root
-mount. In this case we return 1 if the descriptor corresponds to a mount
-point and and also returns the super magic of the covering mount if there
-is one or 0 if it isn't a mountpoint.
-
-If a path is supplied (and the ioctlfd field is set to -1) then the path
-is looked up and is checked to see if it is the root of a mount. If a
-type is also given we are looking for a particular autofs mount and if
-a match isn't found a fail is returned. If the the located path is the
-root of a mount 1 is returned along with the super magic of the mount
-or 0 otherwise.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 0598bc52abdc..c9480138d47e 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -51,6 +51,7 @@ Documentation for filesystem implementations.
    affs
    afs
    autofs
+   autofs-mount-control
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From c54ad9a4e8faf080e6b395cc4b8298dfc5170255 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:52 +0100
Subject: docs: filesystems: convert befs.txt to ReST

- Add a SPDX header;
- Adjust document and section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/3e29ea6df6cd569021cfa953ccb8ed7dfc146f3d.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/befs.rst  | 128 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/befs.txt  | 117 --------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 129 insertions(+), 117 deletions(-)
 create mode 100644 Documentation/filesystems/befs.rst
 delete mode 100644 Documentation/filesystems/befs.txt

diff --git a/Documentation/filesystems/befs.rst b/Documentation/filesystems/befs.rst
new file mode 100644
index 000000000000..79f9740d76ff
--- /dev/null
+++ b/Documentation/filesystems/befs.rst
@@ -0,0 +1,128 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=========================
+BeOS filesystem for Linux
+=========================
+
+Document last updated: Dec 6, 2001
+
+Warning
+=======
+Make sure you understand that this is alpha software.  This means that the
+implementation is neither complete nor well-tested.
+
+I DISCLAIM ALL RESPONSIBILITY FOR ANY POSSIBLE BAD EFFECTS OF THIS CODE!
+
+License
+=======
+This software is covered by the GNU General Public License.
+See the file COPYING for the complete text of the license.
+Or the GNU website: <http://www.gnu.org/licenses/licenses.html>
+
+Author
+======
+The largest part of the code written by Will Dyson <will_dyson@pobox.com>
+He has been working on the code since Aug 13, 2001. See the changelog for
+details.
+
+Original Author: Makoto Kato <m_kato@ga2.so-net.ne.jp>
+
+His original code can still be found at:
+<http://hp.vector.co.jp/authors/VA008030/bfs/>
+
+Does anyone know of a more current email address for Makoto? He doesn't
+respond to the address given above...
+
+This filesystem doesn't have a maintainer.
+
+What is this Driver?
+====================
+This module implements the native filesystem of BeOS http://www.beincorporated.com/
+for the linux 2.4.1 and later kernels. Currently it is a read-only
+implementation.
+
+Which is it, BFS or BEFS?
+=========================
+Be, Inc said, "BeOS Filesystem is officially called BFS, not BeFS".
+But Unixware Boot Filesystem is called bfs, too. And they are already in
+the kernel. Because of this naming conflict, on Linux the BeOS
+filesystem is called befs.
+
+How to Install
+==============
+step 1.  Install the BeFS  patch into the source code tree of linux.
+
+Apply the patchfile to your kernel source tree.
+Assuming that your kernel source is in /foo/bar/linux and the patchfile
+is called patch-befs-xxx, you would do the following:
+
+	cd /foo/bar/linux
+	patch -p1 < /path/to/patch-befs-xxx
+
+if the patching step fails (i.e. there are rejected hunks), you can try to
+figure it out yourself (it shouldn't be hard), or mail the maintainer
+(Will Dyson <will_dyson@pobox.com>) for help.
+
+step 2.  Configuration & make kernel
+
+The linux kernel has many compile-time options. Most of them are beyond the
+scope of this document. I suggest the Kernel-HOWTO document as a good general
+reference on this topic. http://www.linuxdocs.org/HOWTOs/Kernel-HOWTO-4.html
+
+However, to use the BeFS module, you must enable it at configure time::
+
+	cd /foo/bar/linux
+	make menuconfig (or xconfig)
+
+The BeFS module is not a standard part of the linux kernel, so you must first
+enable support for experimental code under the "Code maturity level" menu.
+
+Then, under the "Filesystems" menu will be an option called "BeFS
+filesystem (experimental)", or something like that. Enable that option
+(it is fine to make it a module).
+
+Save your kernel configuration and then build your kernel.
+
+step 3.  Install
+
+See the kernel howto <http://www.linux.com/howto/Kernel-HOWTO.html> for
+instructions on this critical step.
+
+Using BFS
+=========
+To use the BeOS filesystem, use filesystem type 'befs'.
+
+ex::
+
+    mount -t befs /dev/fd0 /beos
+
+Mount Options
+=============
+
+=============  ===========================================================
+uid=nnn        All files in the partition will be owned by user id nnn.
+gid=nnn	       All files in the partition will be in group nnn.
+iocharset=xxx  Use xxx as the name of the NLS translation table.
+debug          The driver will output debugging information to the syslog.
+=============  ===========================================================
+
+How to Get Lastest Version
+==========================
+
+The latest version is currently available at:
+<http://befs-driver.sourceforge.net/>
+
+Any Known Bugs?
+===============
+As of Jan 20, 2002:
+
+	None
+
+Special Thanks
+==============
+Dominic Giampalo ... Writing "Practical file system design with Be filesystem"
+
+Hiroyuki Yamada  ... Testing LinuxPPC.
+
+
+
diff --git a/Documentation/filesystems/befs.txt b/Documentation/filesystems/befs.txt
deleted file mode 100644
index da45e6c842b8..000000000000
--- a/Documentation/filesystems/befs.txt
+++ /dev/null
@@ -1,117 +0,0 @@
-BeOS filesystem for Linux
-
-Document last updated: Dec 6, 2001
-
-WARNING
-=======
-Make sure you understand that this is alpha software.  This means that the
-implementation is neither complete nor well-tested. 
-
-I DISCLAIM ALL RESPONSIBILITY FOR ANY POSSIBLE BAD EFFECTS OF THIS CODE!
-
-LICENSE
-=====
-This software is covered by the GNU General Public License. 
-See the file COPYING for the complete text of the license.
-Or the GNU website: <http://www.gnu.org/licenses/licenses.html>
-
-AUTHOR
-=====
-The largest part of the code written by Will Dyson <will_dyson@pobox.com>
-He has been working on the code since Aug 13, 2001. See the changelog for
-details.
-
-Original Author: Makoto Kato <m_kato@ga2.so-net.ne.jp>
-His original code can still be found at:
-<http://hp.vector.co.jp/authors/VA008030/bfs/>
-Does anyone know of a more current email address for Makoto? He doesn't
-respond to the address given above...
-
-This filesystem doesn't have a maintainer.
-
-WHAT IS THIS DRIVER?
-==================
-This module implements the native filesystem of BeOS http://www.beincorporated.com/ 
-for the linux 2.4.1 and later kernels. Currently it is a read-only
-implementation.
-
-Which is it, BFS or BEFS?
-================
-Be, Inc said, "BeOS Filesystem is officially called BFS, not BeFS". 
-But Unixware Boot Filesystem is called bfs, too. And they are already in
-the kernel. Because of this naming conflict, on Linux the BeOS
-filesystem is called befs.
-
-HOW TO INSTALL
-==============
-step 1.  Install the BeFS  patch into the source code tree of linux.
-
-Apply the patchfile to your kernel source tree.
-Assuming that your kernel source is in /foo/bar/linux and the patchfile
-is called patch-befs-xxx, you would do the following:
-
-	cd /foo/bar/linux
-	patch -p1 < /path/to/patch-befs-xxx
-
-if the patching step fails (i.e. there are rejected hunks), you can try to
-figure it out yourself (it shouldn't be hard), or mail the maintainer 
-(Will Dyson <will_dyson@pobox.com>) for help.
-
-step 2.  Configuration & make kernel
-
-The linux kernel has many compile-time options. Most of them are beyond the
-scope of this document. I suggest the Kernel-HOWTO document as a good general
-reference on this topic. http://www.linuxdocs.org/HOWTOs/Kernel-HOWTO-4.html 
-
-However, to use the BeFS module, you must enable it at configure time.
-
-	cd /foo/bar/linux
-	make menuconfig (or xconfig)
-
-The BeFS module is not a standard part of the linux kernel, so you must first
-enable support for experimental code under the "Code maturity level" menu.
-
-Then, under the "Filesystems" menu will be an option called "BeFS
-filesystem (experimental)", or something like that. Enable that option
-(it is fine to make it a module).
-
-Save your kernel configuration and then build your kernel.
-
-step 3.  Install
-
-See the kernel howto <http://www.linux.com/howto/Kernel-HOWTO.html> for
-instructions on this critical step.
-
-USING BFS
-=========
-To use the BeOS filesystem, use filesystem type 'befs'.
-
-ex)
-    mount -t befs /dev/fd0 /beos
-
-MOUNT OPTIONS
-=============
-uid=nnn        All files in the partition will be owned by user id nnn.
-gid=nnn	       All files in the partition will be in group nnn.
-iocharset=xxx  Use xxx as the name of the NLS translation table.
-debug          The driver will output debugging information to the syslog.
-
-HOW TO GET LASTEST VERSION
-==========================
-
-The latest version is currently available at:
-<http://befs-driver.sourceforge.net/>
-
-ANY KNOWN BUGS?
-===========
-As of Jan 20, 2002:
-	
-	None
-
-SPECIAL THANKS
-==============
-Dominic Giampalo ... Writing "Practical file system design with Be filesystem"
-Hiroyuki Yamada  ... Testing LinuxPPC.
-
-
-
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index c9480138d47e..98de437f5500 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -52,6 +52,7 @@ Documentation for filesystem implementations.
    afs
    autofs
    autofs-mount-control
+   befs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From ee68f34d7e7e553ffb74f09df0f3764fbfcf5d4b Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:53 +0100
Subject: docs: filesystems: convert bfs.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/93991bcc05e419368ee1e585c81057fb2c7c8d2b.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/bfs.rst   | 60 +++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/bfs.txt   | 57 -----------------------------------
 Documentation/filesystems/index.rst |  1 +
 3 files changed, 61 insertions(+), 57 deletions(-)
 create mode 100644 Documentation/filesystems/bfs.rst
 delete mode 100644 Documentation/filesystems/bfs.txt

diff --git a/Documentation/filesystems/bfs.rst b/Documentation/filesystems/bfs.rst
new file mode 100644
index 000000000000..ce14b9018807
--- /dev/null
+++ b/Documentation/filesystems/bfs.rst
@@ -0,0 +1,60 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================
+BFS Filesystem for Linux
+========================
+
+The BFS filesystem is used by SCO UnixWare OS for the /stand slice, which
+usually contains the kernel image and a few other files required for the
+boot process.
+
+In order to access /stand partition under Linux you obviously need to
+know the partition number and the kernel must support UnixWare disk slices
+(CONFIG_UNIXWARE_DISKLABEL config option). However BFS support does not
+depend on having UnixWare disklabel support because one can also mount
+BFS filesystem via loopback::
+
+    # losetup /dev/loop0 stand.img
+    # mount -t bfs /dev/loop0 /mnt/stand
+
+where stand.img is a file containing the image of BFS filesystem.
+When you have finished using it and umounted you need to also deallocate
+/dev/loop0 device by::
+
+    # losetup -d /dev/loop0
+
+You can simplify mounting by just typing::
+
+    # mount -t bfs -o loop stand.img /mnt/stand
+
+this will allocate the first available loopback device (and load loop.o
+kernel module if necessary) automatically. If the loopback driver is not
+loaded automatically, make sure that you have compiled the module and
+that modprobe is functioning. Beware that umount will not deallocate
+/dev/loopN device if /etc/mtab file on your system is a symbolic link to
+/proc/mounts. You will need to do it manually using "-d" switch of
+losetup(8). Read losetup(8) manpage for more info.
+
+To create the BFS image under UnixWare you need to find out first which
+slice contains it. The command prtvtoc(1M) is your friend::
+
+    # prtvtoc /dev/rdsk/c0b0t0d0s0
+
+(assuming your root disk is on target=0, lun=0, bus=0, controller=0). Then you
+look for the slice with tag "STAND", which is usually slice 10. With this
+information you can use dd(1) to create the BFS image::
+
+    # umount /stand
+    # dd if=/dev/rdsk/c0b0t0d0sa of=stand.img bs=512
+
+Just in case, you can verify that you have done the right thing by checking
+the magic number::
+
+    # od -Ad -tx4 stand.img | more
+
+The first 4 bytes should be 0x1badface.
+
+If you have any patches, questions or suggestions regarding this BFS
+implementation please contact the author:
+
+Tigran Aivazian <aivazian.tigran@gmail.com>
diff --git a/Documentation/filesystems/bfs.txt b/Documentation/filesystems/bfs.txt
deleted file mode 100644
index 843ce91a2e40..000000000000
--- a/Documentation/filesystems/bfs.txt
+++ /dev/null
@@ -1,57 +0,0 @@
-BFS FILESYSTEM FOR LINUX
-========================
-
-The BFS filesystem is used by SCO UnixWare OS for the /stand slice, which
-usually contains the kernel image and a few other files required for the
-boot process.
-
-In order to access /stand partition under Linux you obviously need to
-know the partition number and the kernel must support UnixWare disk slices
-(CONFIG_UNIXWARE_DISKLABEL config option). However BFS support does not
-depend on having UnixWare disklabel support because one can also mount
-BFS filesystem via loopback:
-
-# losetup /dev/loop0 stand.img
-# mount -t bfs /dev/loop0 /mnt/stand
-
-where stand.img is a file containing the image of BFS filesystem. 
-When you have finished using it and umounted you need to also deallocate
-/dev/loop0 device by:
-
-# losetup -d /dev/loop0
-
-You can simplify mounting by just typing:
-
-# mount -t bfs -o loop stand.img /mnt/stand
-
-this will allocate the first available loopback device (and load loop.o 
-kernel module if necessary) automatically. If the loopback driver is not
-loaded automatically, make sure that you have compiled the module and
-that modprobe is functioning. Beware that umount will not deallocate
-/dev/loopN device if /etc/mtab file on your system is a symbolic link to
-/proc/mounts. You will need to do it manually using "-d" switch of
-losetup(8). Read losetup(8) manpage for more info.
-
-To create the BFS image under UnixWare you need to find out first which
-slice contains it. The command prtvtoc(1M) is your friend:
-
-# prtvtoc /dev/rdsk/c0b0t0d0s0
-
-(assuming your root disk is on target=0, lun=0, bus=0, controller=0). Then you
-look for the slice with tag "STAND", which is usually slice 10. With this
-information you can use dd(1) to create the BFS image:
-
-# umount /stand
-# dd if=/dev/rdsk/c0b0t0d0sa of=stand.img bs=512
-
-Just in case, you can verify that you have done the right thing by checking
-the magic number:
-
-# od -Ad -tx4 stand.img | more
-
-The first 4 bytes should be 0x1badface.
-
-If you have any patches, questions or suggestions regarding this BFS
-implementation please contact the author:
-
-Tigran Aivazian <aivazian.tigran@gmail.com>
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 98de437f5500..f74e6b273d9f 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -53,6 +53,7 @@ Documentation for filesystem implementations.
    autofs
    autofs-mount-control
    befs
+   bfs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 5d43e1bc2dfccbb07ea662fa4536544f1b6efd43 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:54 +0100
Subject: docs: filesystems: convert btrfs.txt to ReST

Just trivial changes:

- Add a SPDX header;
- Add it to filesystems/index.rst.

While here, adjust document title, just to make it use the same
style of the other docs.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: David Sterba <dsterba@suse.com>
Link: https://lore.kernel.org/r/1ef76da4ac24a9a6f6187723554733c702ea19ae.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/btrfs.rst | 34 ++++++++++++++++++++++++++++++++++
 Documentation/filesystems/btrfs.txt | 31 -------------------------------
 Documentation/filesystems/index.rst |  1 +
 3 files changed, 35 insertions(+), 31 deletions(-)
 create mode 100644 Documentation/filesystems/btrfs.rst
 delete mode 100644 Documentation/filesystems/btrfs.txt

diff --git a/Documentation/filesystems/btrfs.rst b/Documentation/filesystems/btrfs.rst
new file mode 100644
index 000000000000..d0904f602819
--- /dev/null
+++ b/Documentation/filesystems/btrfs.rst
@@ -0,0 +1,34 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====
+BTRFS
+=====
+
+Btrfs is a copy on write filesystem for Linux aimed at implementing advanced
+features while focusing on fault tolerance, repair and easy administration.
+Jointly developed by several companies, licensed under the GPL and open for
+contribution from anyone.
+
+The main Btrfs features include:
+
+    * Extent based file storage (2^64 max file size)
+    * Space efficient packing of small files
+    * Space efficient indexed directories
+    * Dynamic inode allocation
+    * Writable snapshots
+    * Subvolumes (separate internal filesystem roots)
+    * Object level mirroring and striping
+    * Checksums on data and metadata (multiple algorithms available)
+    * Compression
+    * Integrated multiple device support, with several raid algorithms
+    * Offline filesystem check
+    * Efficient incremental backup and FS mirroring
+    * Online filesystem defragmentation
+
+For more information please refer to the wiki
+
+  https://btrfs.wiki.kernel.org
+
+that maintains information about administration tasks, frequently asked
+questions, use cases, mount options, comprehensible changelogs, features,
+manual pages, source code repositories, contacts etc.
diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt
deleted file mode 100644
index f9dad22d95ce..000000000000
--- a/Documentation/filesystems/btrfs.txt
+++ /dev/null
@@ -1,31 +0,0 @@
-BTRFS
-=====
-
-Btrfs is a copy on write filesystem for Linux aimed at implementing advanced
-features while focusing on fault tolerance, repair and easy administration.
-Jointly developed by several companies, licensed under the GPL and open for
-contribution from anyone.
-
-The main Btrfs features include:
-
-    * Extent based file storage (2^64 max file size)
-    * Space efficient packing of small files
-    * Space efficient indexed directories
-    * Dynamic inode allocation
-    * Writable snapshots
-    * Subvolumes (separate internal filesystem roots)
-    * Object level mirroring and striping
-    * Checksums on data and metadata (multiple algorithms available)
-    * Compression
-    * Integrated multiple device support, with several raid algorithms
-    * Offline filesystem check
-    * Efficient incremental backup and FS mirroring
-    * Online filesystem defragmentation
-
-For more information please refer to the wiki
-
-  https://btrfs.wiki.kernel.org
-
-that maintains information about administration tasks, frequently asked
-questions, use cases, mount options, comprehensible changelogs, features,
-manual pages, source code repositories, contacts etc.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index f74e6b273d9f..dae862cf167e 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -54,6 +54,7 @@ Documentation for filesystem implementations.
    autofs-mount-control
    befs
    bfs
+   btrfs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 471379a174aa444b326d1b74e9f96a8b4b766b79 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:55 +0100
Subject: docs: filesystems: convert ceph.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/df2f142b5ca5842e030d8209482dfd62dcbe020f.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/ceph.rst  | 190 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/ceph.txt  | 186 -----------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 191 insertions(+), 186 deletions(-)
 create mode 100644 Documentation/filesystems/ceph.rst
 delete mode 100644 Documentation/filesystems/ceph.txt

diff --git a/Documentation/filesystems/ceph.rst b/Documentation/filesystems/ceph.rst
new file mode 100644
index 000000000000..b46a7218248f
--- /dev/null
+++ b/Documentation/filesystems/ceph.rst
@@ -0,0 +1,190 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+Ceph Distributed File System
+============================
+
+Ceph is a distributed network file system designed to provide good
+performance, reliability, and scalability.
+
+Basic features include:
+
+ * POSIX semantics
+ * Seamless scaling from 1 to many thousands of nodes
+ * High availability and reliability.  No single point of failure.
+ * N-way replication of data across storage nodes
+ * Fast recovery from node failures
+ * Automatic rebalancing of data on node addition/removal
+ * Easy deployment: most FS components are userspace daemons
+
+Also,
+
+ * Flexible snapshots (on any directory)
+ * Recursive accounting (nested files, directories, bytes)
+
+In contrast to cluster filesystems like GFS, OCFS2, and GPFS that rely
+on symmetric access by all clients to shared block devices, Ceph
+separates data and metadata management into independent server
+clusters, similar to Lustre.  Unlike Lustre, however, metadata and
+storage nodes run entirely as user space daemons.  File data is striped
+across storage nodes in large chunks to distribute workload and
+facilitate high throughputs.  When storage nodes fail, data is
+re-replicated in a distributed fashion by the storage nodes themselves
+(with some minimal coordination from a cluster monitor), making the
+system extremely efficient and scalable.
+
+Metadata servers effectively form a large, consistent, distributed
+in-memory cache above the file namespace that is extremely scalable,
+dynamically redistributes metadata in response to workload changes,
+and can tolerate arbitrary (well, non-Byzantine) node failures.  The
+metadata server takes a somewhat unconventional approach to metadata
+storage to significantly improve performance for common workloads.  In
+particular, inodes with only a single link are embedded in
+directories, allowing entire directories of dentries and inodes to be
+loaded into its cache with a single I/O operation.  The contents of
+extremely large directories can be fragmented and managed by
+independent metadata servers, allowing scalable concurrent access.
+
+The system offers automatic data rebalancing/migration when scaling
+from a small cluster of just a few nodes to many hundreds, without
+requiring an administrator carve the data set into static volumes or
+go through the tedious process of migrating data between servers.
+When the file system approaches full, new nodes can be easily added
+and things will "just work."
+
+Ceph includes flexible snapshot mechanism that allows a user to create
+a snapshot on any subdirectory (and its nested contents) in the
+system.  Snapshot creation and deletion are as simple as 'mkdir
+.snap/foo' and 'rmdir .snap/foo'.
+
+Ceph also provides some recursive accounting on directories for nested
+files and bytes.  That is, a 'getfattr -d foo' on any directory in the
+system will reveal the total number of nested regular files and
+subdirectories, and a summation of all nested file sizes.  This makes
+the identification of large disk space consumers relatively quick, as
+no 'du' or similar recursive scan of the file system is required.
+
+Finally, Ceph also allows quotas to be set on any directory in the system.
+The quota can restrict the number of bytes or the number of files stored
+beneath that point in the directory hierarchy.  Quotas can be set using
+extended attributes 'ceph.quota.max_files' and 'ceph.quota.max_bytes', eg::
+
+ setfattr -n ceph.quota.max_bytes -v 100000000 /some/dir
+ getfattr -n ceph.quota.max_bytes /some/dir
+
+A limitation of the current quotas implementation is that it relies on the
+cooperation of the client mounting the file system to stop writers when a
+limit is reached.  A modified or adversarial client cannot be prevented
+from writing as much data as it needs.
+
+Mount Syntax
+============
+
+The basic mount syntax is::
+
+ # mount -t ceph monip[:port][,monip2[:port]...]:/[subdir] mnt
+
+You only need to specify a single monitor, as the client will get the
+full list when it connects.  (However, if the monitor you specify
+happens to be down, the mount won't succeed.)  The port can be left
+off if the monitor is using the default.  So if the monitor is at
+1.2.3.4::
+
+ # mount -t ceph 1.2.3.4:/ /mnt/ceph
+
+is sufficient.  If /sbin/mount.ceph is installed, a hostname can be
+used instead of an IP address.
+
+
+
+Mount Options
+=============
+
+  ip=A.B.C.D[:N]
+	Specify the IP and/or port the client should bind to locally.
+	There is normally not much reason to do this.  If the IP is not
+	specified, the client's IP address is determined by looking at the
+	address its connection to the monitor originates from.
+
+  wsize=X
+	Specify the maximum write size in bytes.  Default: 16 MB.
+
+  rsize=X
+	Specify the maximum read size in bytes.  Default: 16 MB.
+
+  rasize=X
+	Specify the maximum readahead size in bytes.  Default: 8 MB.
+
+  mount_timeout=X
+	Specify the timeout value for mount (in seconds), in the case
+	of a non-responsive Ceph file system.  The default is 30
+	seconds.
+
+  caps_max=X
+	Specify the maximum number of caps to hold. Unused caps are released
+	when number of caps exceeds the limit. The default is 0 (no limit)
+
+  rbytes
+	When stat() is called on a directory, set st_size to 'rbytes',
+	the summation of file sizes over all files nested beneath that
+	directory.  This is the default.
+
+  norbytes
+	When stat() is called on a directory, set st_size to the
+	number of entries in that directory.
+
+  nocrc
+	Disable CRC32C calculation for data writes.  If set, the storage node
+	must rely on TCP's error correction to detect data corruption
+	in the data payload.
+
+  dcache
+        Use the dcache contents to perform negative lookups and
+        readdir when the client has the entire directory contents in
+        its cache.  (This does not change correctness; the client uses
+        cached metadata only when a lease or capability ensures it is
+        valid.)
+
+  nodcache
+        Do not use the dcache as above.  This avoids a significant amount of
+        complex code, sacrificing performance without affecting correctness,
+        and is useful for tracking down bugs.
+
+  noasyncreaddir
+	Do not use the dcache as above for readdir.
+
+  noquotadf
+        Report overall filesystem usage in statfs instead of using the root
+        directory quota.
+
+  nocopyfrom
+        Don't use the RADOS 'copy-from' operation to perform remote object
+        copies.  Currently, it's only used in copy_file_range, which will revert
+        to the default VFS implementation if this option is used.
+
+  recover_session=<no|clean>
+	Set auto reconnect mode in the case where the client is blacklisted. The
+	available modes are "no" and "clean". The default is "no".
+
+	* no: never attempt to reconnect when client detects that it has been
+	  blacklisted. Operations will generally fail after being blacklisted.
+
+	* clean: client reconnects to the ceph cluster automatically when it
+	  detects that it has been blacklisted. During reconnect, client drops
+	  dirty data/metadata, invalidates page caches and writable file handles.
+	  After reconnect, file locks become stale because the MDS loses track
+	  of them. If an inode contains any stale file locks, read/write on the
+	  inode is not allowed until applications release all stale file locks.
+
+More Information
+================
+
+For more information on Ceph, see the home page at
+	https://ceph.com/
+
+The Linux kernel client source tree is available at
+	- https://github.com/ceph/ceph-client.git
+	- git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
+
+and the source for the full system is at
+	https://github.com/ceph/ceph.git
diff --git a/Documentation/filesystems/ceph.txt b/Documentation/filesystems/ceph.txt
deleted file mode 100644
index b19b6a03f91c..000000000000
--- a/Documentation/filesystems/ceph.txt
+++ /dev/null
@@ -1,186 +0,0 @@
-Ceph Distributed File System
-============================
-
-Ceph is a distributed network file system designed to provide good
-performance, reliability, and scalability.
-
-Basic features include:
-
- * POSIX semantics
- * Seamless scaling from 1 to many thousands of nodes
- * High availability and reliability.  No single point of failure.
- * N-way replication of data across storage nodes
- * Fast recovery from node failures
- * Automatic rebalancing of data on node addition/removal
- * Easy deployment: most FS components are userspace daemons
-
-Also,
- * Flexible snapshots (on any directory)
- * Recursive accounting (nested files, directories, bytes)
-
-In contrast to cluster filesystems like GFS, OCFS2, and GPFS that rely
-on symmetric access by all clients to shared block devices, Ceph
-separates data and metadata management into independent server
-clusters, similar to Lustre.  Unlike Lustre, however, metadata and
-storage nodes run entirely as user space daemons.  File data is striped
-across storage nodes in large chunks to distribute workload and
-facilitate high throughputs.  When storage nodes fail, data is
-re-replicated in a distributed fashion by the storage nodes themselves
-(with some minimal coordination from a cluster monitor), making the
-system extremely efficient and scalable.
-
-Metadata servers effectively form a large, consistent, distributed
-in-memory cache above the file namespace that is extremely scalable,
-dynamically redistributes metadata in response to workload changes,
-and can tolerate arbitrary (well, non-Byzantine) node failures.  The
-metadata server takes a somewhat unconventional approach to metadata
-storage to significantly improve performance for common workloads.  In
-particular, inodes with only a single link are embedded in
-directories, allowing entire directories of dentries and inodes to be
-loaded into its cache with a single I/O operation.  The contents of
-extremely large directories can be fragmented and managed by
-independent metadata servers, allowing scalable concurrent access.
-
-The system offers automatic data rebalancing/migration when scaling
-from a small cluster of just a few nodes to many hundreds, without
-requiring an administrator carve the data set into static volumes or
-go through the tedious process of migrating data between servers.
-When the file system approaches full, new nodes can be easily added
-and things will "just work."
-
-Ceph includes flexible snapshot mechanism that allows a user to create
-a snapshot on any subdirectory (and its nested contents) in the
-system.  Snapshot creation and deletion are as simple as 'mkdir
-.snap/foo' and 'rmdir .snap/foo'.
-
-Ceph also provides some recursive accounting on directories for nested
-files and bytes.  That is, a 'getfattr -d foo' on any directory in the
-system will reveal the total number of nested regular files and
-subdirectories, and a summation of all nested file sizes.  This makes
-the identification of large disk space consumers relatively quick, as
-no 'du' or similar recursive scan of the file system is required.
-
-Finally, Ceph also allows quotas to be set on any directory in the system.
-The quota can restrict the number of bytes or the number of files stored
-beneath that point in the directory hierarchy.  Quotas can be set using
-extended attributes 'ceph.quota.max_files' and 'ceph.quota.max_bytes', eg:
-
- setfattr -n ceph.quota.max_bytes -v 100000000 /some/dir
- getfattr -n ceph.quota.max_bytes /some/dir
-
-A limitation of the current quotas implementation is that it relies on the
-cooperation of the client mounting the file system to stop writers when a
-limit is reached.  A modified or adversarial client cannot be prevented
-from writing as much data as it needs.
-
-Mount Syntax
-============
-
-The basic mount syntax is:
-
- # mount -t ceph monip[:port][,monip2[:port]...]:/[subdir] mnt
-
-You only need to specify a single monitor, as the client will get the
-full list when it connects.  (However, if the monitor you specify
-happens to be down, the mount won't succeed.)  The port can be left
-off if the monitor is using the default.  So if the monitor is at
-1.2.3.4,
-
- # mount -t ceph 1.2.3.4:/ /mnt/ceph
-
-is sufficient.  If /sbin/mount.ceph is installed, a hostname can be
-used instead of an IP address.
-
-
-
-Mount Options
-=============
-
-  ip=A.B.C.D[:N]
-	Specify the IP and/or port the client should bind to locally.
-	There is normally not much reason to do this.  If the IP is not
-	specified, the client's IP address is determined by looking at the
-	address its connection to the monitor originates from.
-
-  wsize=X
-	Specify the maximum write size in bytes.  Default: 16 MB.
-
-  rsize=X
-	Specify the maximum read size in bytes.  Default: 16 MB.
-
-  rasize=X
-	Specify the maximum readahead size in bytes.  Default: 8 MB.
-
-  mount_timeout=X
-	Specify the timeout value for mount (in seconds), in the case
-	of a non-responsive Ceph file system.  The default is 30
-	seconds.
-
-  caps_max=X
-	Specify the maximum number of caps to hold. Unused caps are released
-	when number of caps exceeds the limit. The default is 0 (no limit)
-
-  rbytes
-	When stat() is called on a directory, set st_size to 'rbytes',
-	the summation of file sizes over all files nested beneath that
-	directory.  This is the default.
-
-  norbytes
-	When stat() is called on a directory, set st_size to the
-	number of entries in that directory.
-
-  nocrc
-	Disable CRC32C calculation for data writes.  If set, the storage node
-	must rely on TCP's error correction to detect data corruption
-	in the data payload.
-
-  dcache
-        Use the dcache contents to perform negative lookups and
-        readdir when the client has the entire directory contents in
-        its cache.  (This does not change correctness; the client uses
-        cached metadata only when a lease or capability ensures it is
-        valid.)
-
-  nodcache
-        Do not use the dcache as above.  This avoids a significant amount of
-        complex code, sacrificing performance without affecting correctness,
-        and is useful for tracking down bugs.
-
-  noasyncreaddir
-	Do not use the dcache as above for readdir.
-
-  noquotadf
-        Report overall filesystem usage in statfs instead of using the root
-        directory quota.
-
-  nocopyfrom
-        Don't use the RADOS 'copy-from' operation to perform remote object
-        copies.  Currently, it's only used in copy_file_range, which will revert
-        to the default VFS implementation if this option is used.
-
-  recover_session=<no|clean>
-	Set auto reconnect mode in the case where the client is blacklisted. The
-	available modes are "no" and "clean". The default is "no".
-
-	* no: never attempt to reconnect when client detects that it has been
-	blacklisted. Operations will generally fail after being blacklisted.
-
-	* clean: client reconnects to the ceph cluster automatically when it
-	detects that it has been blacklisted. During reconnect, client drops
-	dirty data/metadata, invalidates page caches and writable file handles.
-	After reconnect, file locks become stale because the MDS loses track
-	of them. If an inode contains any stale file locks, read/write on the
-	inode is not allowed until applications release all stale file locks.
-
-More Information
-================
-
-For more information on Ceph, see the home page at
-	https://ceph.com/
-
-The Linux kernel client source tree is available at
-	https://github.com/ceph/ceph-client.git
-	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
-
-and the source for the full system is at
-	https://github.com/ceph/ceph.git
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index dae862cf167e..ddd8f7b2bb25 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -55,6 +55,7 @@ Documentation for filesystem implementations.
    befs
    bfs
    btrfs
+   ceph
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From f1fa0e6028d395c5f0d1a0929a795b8dc0d43295 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:56 +0100
Subject: docs: filesystems: convert cramfs.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Link: https://lore.kernel.org/r/e87b267e71f99974b7bb3fc0a4a08454ff58165e.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/cramfs.rst | 123 +++++++++++++++++++++++++++++++++++
 Documentation/filesystems/cramfs.txt | 118 ---------------------------------
 Documentation/filesystems/index.rst  |   1 +
 3 files changed, 124 insertions(+), 118 deletions(-)
 create mode 100644 Documentation/filesystems/cramfs.rst
 delete mode 100644 Documentation/filesystems/cramfs.txt

diff --git a/Documentation/filesystems/cramfs.rst b/Documentation/filesystems/cramfs.rst
new file mode 100644
index 000000000000..afbdbde98bd2
--- /dev/null
+++ b/Documentation/filesystems/cramfs.rst
@@ -0,0 +1,123 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================================
+Cramfs - cram a filesystem onto a small ROM
+===========================================
+
+cramfs is designed to be simple and small, and to compress things well.
+
+It uses the zlib routines to compress a file one page at a time, and
+allows random page access.  The meta-data is not compressed, but is
+expressed in a very terse representation to make it use much less
+diskspace than traditional filesystems.
+
+You can't write to a cramfs filesystem (making it compressible and
+compact also makes it _very_ hard to update on-the-fly), so you have to
+create the disk image with the "mkcramfs" utility.
+
+
+Usage Notes
+-----------
+
+File sizes are limited to less than 16MB.
+
+Maximum filesystem size is a little over 256MB.  (The last file on the
+filesystem is allowed to extend past 256MB.)
+
+Only the low 8 bits of gid are stored.  The current version of
+mkcramfs simply truncates to 8 bits, which is a potential security
+issue.
+
+Hard links are supported, but hard linked files
+will still have a link count of 1 in the cramfs image.
+
+Cramfs directories have no ``.`` or ``..`` entries.  Directories (like
+every other file on cramfs) always have a link count of 1.  (There's
+no need to use -noleaf in ``find``, btw.)
+
+No timestamps are stored in a cramfs, so these default to the epoch
+(1970 GMT).  Recently-accessed files may have updated timestamps, but
+the update lasts only as long as the inode is cached in memory, after
+which the timestamp reverts to 1970, i.e. moves backwards in time.
+
+Currently, cramfs must be written and read with architectures of the
+same endianness, and can be read only by kernels with PAGE_SIZE
+== 4096.  At least the latter of these is a bug, but it hasn't been
+decided what the best fix is.  For the moment if you have larger pages
+you can just change the #define in mkcramfs.c, so long as you don't
+mind the filesystem becoming unreadable to future kernels.
+
+
+Memory Mapped cramfs image
+--------------------------
+
+The CRAMFS_MTD Kconfig option adds support for loading data directly from
+a physical linear memory range (usually non volatile memory like Flash)
+instead of going through the block device layer. This saves some memory
+since no intermediate buffering is necessary to hold the data before
+decompressing.
+
+And when data blocks are kept uncompressed and properly aligned, they will
+automatically be mapped directly into user space whenever possible providing
+eXecute-In-Place (XIP) from ROM of read-only segments. Data segments mapped
+read-write (hence they have to be copied to RAM) may still be compressed in
+the cramfs image in the same file along with non compressed read-only
+segments. Both MMU and no-MMU systems are supported. This is particularly
+handy for tiny embedded systems with very tight memory constraints.
+
+The location of the cramfs image in memory is system dependent. You must
+know the proper physical address where the cramfs image is located and
+configure an MTD device for it. Also, that MTD device must be supported
+by a map driver that implements the "point" method. Examples of such
+MTD drivers are cfi_cmdset_0001 (Intel/Sharp CFI flash) or physmap
+(Flash device in physical memory map). MTD partitions based on such devices
+are fine too. Then that device should be specified with the "mtd:" prefix
+as the mount device argument. For example, to mount the MTD device named
+"fs_partition" on the /mnt directory::
+
+    $ mount -t cramfs mtd:fs_partition /mnt
+
+To boot a kernel with this as root filesystem, suffice to specify
+something like "root=mtd:fs_partition" on the kernel command line.
+
+
+Tools
+-----
+
+A version of mkcramfs that can take advantage of the latest capabilities
+described above can be found here:
+
+https://github.com/npitre/cramfs-tools
+
+
+For /usr/share/magic
+--------------------
+
+=====	=======================	=======================
+0	ulelong	0x28cd3d45	Linux cramfs offset 0
+>4	ulelong	x		size %d
+>8	ulelong	x		flags 0x%x
+>12	ulelong	x		future 0x%x
+>16	string	>\0		signature "%.16s"
+>32	ulelong	x		fsid.crc 0x%x
+>36	ulelong	x		fsid.edition %d
+>40	ulelong	x		fsid.blocks %d
+>44	ulelong	x		fsid.files %d
+>48	string	>\0		name "%.16s"
+512	ulelong	0x28cd3d45	Linux cramfs offset 512
+>516	ulelong	x		size %d
+>520	ulelong	x		flags 0x%x
+>524	ulelong	x		future 0x%x
+>528	string	>\0		signature "%.16s"
+>544	ulelong	x		fsid.crc 0x%x
+>548	ulelong	x		fsid.edition %d
+>552	ulelong	x		fsid.blocks %d
+>556	ulelong	x		fsid.files %d
+>560	string	>\0		name "%.16s"
+=====	=======================	=======================
+
+
+Hacker Notes
+------------
+
+See fs/cramfs/README for filesystem layout and implementation notes.
diff --git a/Documentation/filesystems/cramfs.txt b/Documentation/filesystems/cramfs.txt
deleted file mode 100644
index 8e19a53d648b..000000000000
--- a/Documentation/filesystems/cramfs.txt
+++ /dev/null
@@ -1,118 +0,0 @@
-
-	Cramfs - cram a filesystem onto a small ROM
-
-cramfs is designed to be simple and small, and to compress things well. 
-
-It uses the zlib routines to compress a file one page at a time, and
-allows random page access.  The meta-data is not compressed, but is
-expressed in a very terse representation to make it use much less
-diskspace than traditional filesystems. 
-
-You can't write to a cramfs filesystem (making it compressible and
-compact also makes it _very_ hard to update on-the-fly), so you have to
-create the disk image with the "mkcramfs" utility.
-
-
-Usage Notes
------------
-
-File sizes are limited to less than 16MB.
-
-Maximum filesystem size is a little over 256MB.  (The last file on the
-filesystem is allowed to extend past 256MB.)
-
-Only the low 8 bits of gid are stored.  The current version of
-mkcramfs simply truncates to 8 bits, which is a potential security
-issue.
-
-Hard links are supported, but hard linked files
-will still have a link count of 1 in the cramfs image.
-
-Cramfs directories have no `.' or `..' entries.  Directories (like
-every other file on cramfs) always have a link count of 1.  (There's
-no need to use -noleaf in `find', btw.)
-
-No timestamps are stored in a cramfs, so these default to the epoch
-(1970 GMT).  Recently-accessed files may have updated timestamps, but
-the update lasts only as long as the inode is cached in memory, after
-which the timestamp reverts to 1970, i.e. moves backwards in time.
-
-Currently, cramfs must be written and read with architectures of the
-same endianness, and can be read only by kernels with PAGE_SIZE
-== 4096.  At least the latter of these is a bug, but it hasn't been
-decided what the best fix is.  For the moment if you have larger pages
-you can just change the #define in mkcramfs.c, so long as you don't
-mind the filesystem becoming unreadable to future kernels.
-
-
-Memory Mapped cramfs image
---------------------------
-
-The CRAMFS_MTD Kconfig option adds support for loading data directly from
-a physical linear memory range (usually non volatile memory like Flash)
-instead of going through the block device layer. This saves some memory
-since no intermediate buffering is necessary to hold the data before
-decompressing.
-
-And when data blocks are kept uncompressed and properly aligned, they will
-automatically be mapped directly into user space whenever possible providing
-eXecute-In-Place (XIP) from ROM of read-only segments. Data segments mapped
-read-write (hence they have to be copied to RAM) may still be compressed in
-the cramfs image in the same file along with non compressed read-only
-segments. Both MMU and no-MMU systems are supported. This is particularly
-handy for tiny embedded systems with very tight memory constraints.
-
-The location of the cramfs image in memory is system dependent. You must
-know the proper physical address where the cramfs image is located and
-configure an MTD device for it. Also, that MTD device must be supported
-by a map driver that implements the "point" method. Examples of such
-MTD drivers are cfi_cmdset_0001 (Intel/Sharp CFI flash) or physmap
-(Flash device in physical memory map). MTD partitions based on such devices
-are fine too. Then that device should be specified with the "mtd:" prefix
-as the mount device argument. For example, to mount the MTD device named
-"fs_partition" on the /mnt directory:
-
-$ mount -t cramfs mtd:fs_partition /mnt
-
-To boot a kernel with this as root filesystem, suffice to specify
-something like "root=mtd:fs_partition" on the kernel command line.
-
-
-Tools
------
-
-A version of mkcramfs that can take advantage of the latest capabilities
-described above can be found here:
-
-https://github.com/npitre/cramfs-tools
-
-
-For /usr/share/magic
---------------------
-
-0	ulelong	0x28cd3d45	Linux cramfs offset 0
->4	ulelong	x		size %d
->8	ulelong	x		flags 0x%x
->12	ulelong	x		future 0x%x
->16	string	>\0		signature "%.16s"
->32	ulelong	x		fsid.crc 0x%x
->36	ulelong	x		fsid.edition %d
->40	ulelong	x		fsid.blocks %d
->44	ulelong	x		fsid.files %d
->48	string	>\0		name "%.16s"
-512	ulelong	0x28cd3d45	Linux cramfs offset 512
->516	ulelong	x		size %d
->520	ulelong	x		flags 0x%x
->524	ulelong	x		future 0x%x
->528	string	>\0		signature "%.16s"
->544	ulelong	x		fsid.crc 0x%x
->548	ulelong	x		fsid.edition %d
->552	ulelong	x		fsid.blocks %d
->556	ulelong	x		fsid.files %d
->560	string	>\0		name "%.16s"
-
-
-Hacker Notes
-------------
-
-See fs/cramfs/README for filesystem layout and implementation notes.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index ddd8f7b2bb25..8fe848ea04af 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -56,6 +56,7 @@ Documentation for filesystem implementations.
    bfs
    btrfs
    ceph
+   cramfs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 57443789849cd79e66488301a01f01c6340942ce Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:57 +0100
Subject: docs: filesystems: convert debugfs.txt to ReST

- Add a SPDX header;
- Use copyright symbol;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Use footnoote markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/42db8f9db17a5d8b619130815ae63d1615951d50.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/debugfs.rst | 247 ++++++++++++++++++++++++++++++++++
 Documentation/filesystems/debugfs.txt | 241 ---------------------------------
 Documentation/filesystems/index.rst   |   1 +
 3 files changed, 248 insertions(+), 241 deletions(-)
 create mode 100644 Documentation/filesystems/debugfs.rst
 delete mode 100644 Documentation/filesystems/debugfs.txt

diff --git a/Documentation/filesystems/debugfs.rst b/Documentation/filesystems/debugfs.rst
new file mode 100644
index 000000000000..c89d2d335dfb
--- /dev/null
+++ b/Documentation/filesystems/debugfs.rst
@@ -0,0 +1,247 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+=======
+DebugFS
+=======
+
+Copyright |copy| 2009 Jonathan Corbet <corbet@lwn.net>
+
+Debugfs exists as a simple way for kernel developers to make information
+available to user space.  Unlike /proc, which is only meant for information
+about a process, or sysfs, which has strict one-value-per-file rules,
+debugfs has no rules at all.  Developers can put any information they want
+there.  The debugfs filesystem is also intended to not serve as a stable
+ABI to user space; in theory, there are no stability constraints placed on
+files exported there.  The real world is not always so simple, though [1]_;
+even debugfs interfaces are best designed with the idea that they will need
+to be maintained forever.
+
+Debugfs is typically mounted with a command like::
+
+    mount -t debugfs none /sys/kernel/debug
+
+(Or an equivalent /etc/fstab line).
+The debugfs root directory is accessible only to the root user by
+default. To change access to the tree the "uid", "gid" and "mode" mount
+options can be used.
+
+Note that the debugfs API is exported GPL-only to modules.
+
+Code using debugfs should include <linux/debugfs.h>.  Then, the first order
+of business will be to create at least one directory to hold a set of
+debugfs files::
+
+    struct dentry *debugfs_create_dir(const char *name, struct dentry *parent);
+
+This call, if successful, will make a directory called name underneath the
+indicated parent directory.  If parent is NULL, the directory will be
+created in the debugfs root.  On success, the return value is a struct
+dentry pointer which can be used to create files in the directory (and to
+clean it up at the end).  An ERR_PTR(-ERROR) return value indicates that
+something went wrong.  If ERR_PTR(-ENODEV) is returned, that is an
+indication that the kernel has been built without debugfs support and none
+of the functions described below will work.
+
+The most general way to create a file within a debugfs directory is with::
+
+    struct dentry *debugfs_create_file(const char *name, umode_t mode,
+				       struct dentry *parent, void *data,
+				       const struct file_operations *fops);
+
+Here, name is the name of the file to create, mode describes the access
+permissions the file should have, parent indicates the directory which
+should hold the file, data will be stored in the i_private field of the
+resulting inode structure, and fops is a set of file operations which
+implement the file's behavior.  At a minimum, the read() and/or write()
+operations should be provided; others can be included as needed.  Again,
+the return value will be a dentry pointer to the created file,
+ERR_PTR(-ERROR) on error, or ERR_PTR(-ENODEV) if debugfs support is
+missing.
+
+Create a file with an initial size, the following function can be used
+instead::
+
+    struct dentry *debugfs_create_file_size(const char *name, umode_t mode,
+				struct dentry *parent, void *data,
+				const struct file_operations *fops,
+				loff_t file_size);
+
+file_size is the initial file size. The other parameters are the same
+as the function debugfs_create_file.
+
+In a number of cases, the creation of a set of file operations is not
+actually necessary; the debugfs code provides a number of helper functions
+for simple situations.  Files containing a single integer value can be
+created with any of::
+
+    void debugfs_create_u8(const char *name, umode_t mode,
+			   struct dentry *parent, u8 *value);
+    void debugfs_create_u16(const char *name, umode_t mode,
+			    struct dentry *parent, u16 *value);
+    struct dentry *debugfs_create_u32(const char *name, umode_t mode,
+				      struct dentry *parent, u32 *value);
+    void debugfs_create_u64(const char *name, umode_t mode,
+			    struct dentry *parent, u64 *value);
+
+These files support both reading and writing the given value; if a specific
+file should not be written to, simply set the mode bits accordingly.  The
+values in these files are in decimal; if hexadecimal is more appropriate,
+the following functions can be used instead::
+
+    void debugfs_create_x8(const char *name, umode_t mode,
+			   struct dentry *parent, u8 *value);
+    void debugfs_create_x16(const char *name, umode_t mode,
+			    struct dentry *parent, u16 *value);
+    void debugfs_create_x32(const char *name, umode_t mode,
+			    struct dentry *parent, u32 *value);
+    void debugfs_create_x64(const char *name, umode_t mode,
+			    struct dentry *parent, u64 *value);
+
+These functions are useful as long as the developer knows the size of the
+value to be exported.  Some types can have different widths on different
+architectures, though, complicating the situation somewhat.  There are
+functions meant to help out in such special cases::
+
+    void debugfs_create_size_t(const char *name, umode_t mode,
+			       struct dentry *parent, size_t *value);
+
+As might be expected, this function will create a debugfs file to represent
+a variable of type size_t.
+
+Similarly, there are helpers for variables of type unsigned long, in decimal
+and hexadecimal::
+
+    struct dentry *debugfs_create_ulong(const char *name, umode_t mode,
+					struct dentry *parent,
+					unsigned long *value);
+    void debugfs_create_xul(const char *name, umode_t mode,
+			    struct dentry *parent, unsigned long *value);
+
+Boolean values can be placed in debugfs with::
+
+    struct dentry *debugfs_create_bool(const char *name, umode_t mode,
+				       struct dentry *parent, bool *value);
+
+A read on the resulting file will yield either Y (for non-zero values) or
+N, followed by a newline.  If written to, it will accept either upper- or
+lower-case values, or 1 or 0.  Any other input will be silently ignored.
+
+Also, atomic_t values can be placed in debugfs with::
+
+    void debugfs_create_atomic_t(const char *name, umode_t mode,
+				 struct dentry *parent, atomic_t *value)
+
+A read of this file will get atomic_t values, and a write of this file
+will set atomic_t values.
+
+Another option is exporting a block of arbitrary binary data, with
+this structure and function::
+
+    struct debugfs_blob_wrapper {
+	void *data;
+	unsigned long size;
+    };
+
+    struct dentry *debugfs_create_blob(const char *name, umode_t mode,
+				       struct dentry *parent,
+				       struct debugfs_blob_wrapper *blob);
+
+A read of this file will return the data pointed to by the
+debugfs_blob_wrapper structure.  Some drivers use "blobs" as a simple way
+to return several lines of (static) formatted text output.  This function
+can be used to export binary information, but there does not appear to be
+any code which does so in the mainline.  Note that all files created with
+debugfs_create_blob() are read-only.
+
+If you want to dump a block of registers (something that happens quite
+often during development, even if little such code reaches mainline.
+Debugfs offers two functions: one to make a registers-only file, and
+another to insert a register block in the middle of another sequential
+file::
+
+    struct debugfs_reg32 {
+	char *name;
+	unsigned long offset;
+    };
+
+    struct debugfs_regset32 {
+	struct debugfs_reg32 *regs;
+	int nregs;
+	void __iomem *base;
+    };
+
+    struct dentry *debugfs_create_regset32(const char *name, umode_t mode,
+				     struct dentry *parent,
+				     struct debugfs_regset32 *regset);
+
+    void debugfs_print_regs32(struct seq_file *s, struct debugfs_reg32 *regs,
+			 int nregs, void __iomem *base, char *prefix);
+
+The "base" argument may be 0, but you may want to build the reg32 array
+using __stringify, and a number of register names (macros) are actually
+byte offsets over a base for the register block.
+
+If you want to dump an u32 array in debugfs, you can create file with::
+
+    void debugfs_create_u32_array(const char *name, umode_t mode,
+			struct dentry *parent,
+			u32 *array, u32 elements);
+
+The "array" argument provides data, and the "elements" argument is
+the number of elements in the array. Note: Once array is created its
+size can not be changed.
+
+There is a helper function to create device related seq_file::
+
+   struct dentry *debugfs_create_devm_seqfile(struct device *dev,
+				const char *name,
+				struct dentry *parent,
+				int (*read_fn)(struct seq_file *s,
+					void *data));
+
+The "dev" argument is the device related to this debugfs file, and
+the "read_fn" is a function pointer which to be called to print the
+seq_file content.
+
+There are a couple of other directory-oriented helper functions::
+
+    struct dentry *debugfs_rename(struct dentry *old_dir,
+    				  struct dentry *old_dentry,
+		                  struct dentry *new_dir,
+				  const char *new_name);
+
+    struct dentry *debugfs_create_symlink(const char *name,
+                                          struct dentry *parent,
+				      	  const char *target);
+
+A call to debugfs_rename() will give a new name to an existing debugfs
+file, possibly in a different directory.  The new_name must not exist prior
+to the call; the return value is old_dentry with updated information.
+Symbolic links can be created with debugfs_create_symlink().
+
+There is one important thing that all debugfs users must take into account:
+there is no automatic cleanup of any directories created in debugfs.  If a
+module is unloaded without explicitly removing debugfs entries, the result
+will be a lot of stale pointers and no end of highly antisocial behavior.
+So all debugfs users - at least those which can be built as modules - must
+be prepared to remove all files and directories they create there.  A file
+can be removed with::
+
+    void debugfs_remove(struct dentry *dentry);
+
+The dentry value can be NULL or an error value, in which case nothing will
+be removed.
+
+Once upon a time, debugfs users were required to remember the dentry
+pointer for every debugfs file they created so that all files could be
+cleaned up.  We live in more civilized times now, though, and debugfs users
+can call::
+
+    void debugfs_remove_recursive(struct dentry *dentry);
+
+If this function is passed a pointer for the dentry corresponding to the
+top-level directory, the entire hierarchy below that directory will be
+removed.
+
+.. [1] http://lwn.net/Articles/309298/
diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt
deleted file mode 100644
index dc497b96fa4f..000000000000
--- a/Documentation/filesystems/debugfs.txt
+++ /dev/null
@@ -1,241 +0,0 @@
-Copyright 2009 Jonathan Corbet <corbet@lwn.net>
-
-Debugfs exists as a simple way for kernel developers to make information
-available to user space.  Unlike /proc, which is only meant for information
-about a process, or sysfs, which has strict one-value-per-file rules,
-debugfs has no rules at all.  Developers can put any information they want
-there.  The debugfs filesystem is also intended to not serve as a stable
-ABI to user space; in theory, there are no stability constraints placed on
-files exported there.  The real world is not always so simple, though [1];
-even debugfs interfaces are best designed with the idea that they will need
-to be maintained forever.
-
-Debugfs is typically mounted with a command like:
-
-    mount -t debugfs none /sys/kernel/debug
-
-(Or an equivalent /etc/fstab line).
-The debugfs root directory is accessible only to the root user by
-default. To change access to the tree the "uid", "gid" and "mode" mount
-options can be used.
-
-Note that the debugfs API is exported GPL-only to modules.
-
-Code using debugfs should include <linux/debugfs.h>.  Then, the first order
-of business will be to create at least one directory to hold a set of
-debugfs files:
-
-    struct dentry *debugfs_create_dir(const char *name, struct dentry *parent);
-
-This call, if successful, will make a directory called name underneath the
-indicated parent directory.  If parent is NULL, the directory will be
-created in the debugfs root.  On success, the return value is a struct
-dentry pointer which can be used to create files in the directory (and to
-clean it up at the end).  An ERR_PTR(-ERROR) return value indicates that
-something went wrong.  If ERR_PTR(-ENODEV) is returned, that is an
-indication that the kernel has been built without debugfs support and none
-of the functions described below will work.
-
-The most general way to create a file within a debugfs directory is with:
-
-    struct dentry *debugfs_create_file(const char *name, umode_t mode,
-				       struct dentry *parent, void *data,
-				       const struct file_operations *fops);
-
-Here, name is the name of the file to create, mode describes the access
-permissions the file should have, parent indicates the directory which
-should hold the file, data will be stored in the i_private field of the
-resulting inode structure, and fops is a set of file operations which
-implement the file's behavior.  At a minimum, the read() and/or write()
-operations should be provided; others can be included as needed.  Again,
-the return value will be a dentry pointer to the created file,
-ERR_PTR(-ERROR) on error, or ERR_PTR(-ENODEV) if debugfs support is
-missing.
-
-Create a file with an initial size, the following function can be used
-instead:
-
-    struct dentry *debugfs_create_file_size(const char *name, umode_t mode,
-				struct dentry *parent, void *data,
-				const struct file_operations *fops,
-				loff_t file_size);
-
-file_size is the initial file size. The other parameters are the same
-as the function debugfs_create_file.
-
-In a number of cases, the creation of a set of file operations is not
-actually necessary; the debugfs code provides a number of helper functions
-for simple situations.  Files containing a single integer value can be
-created with any of:
-
-    void debugfs_create_u8(const char *name, umode_t mode,
-			   struct dentry *parent, u8 *value);
-    void debugfs_create_u16(const char *name, umode_t mode,
-			    struct dentry *parent, u16 *value);
-    struct dentry *debugfs_create_u32(const char *name, umode_t mode,
-				      struct dentry *parent, u32 *value);
-    void debugfs_create_u64(const char *name, umode_t mode,
-			    struct dentry *parent, u64 *value);
-
-These files support both reading and writing the given value; if a specific
-file should not be written to, simply set the mode bits accordingly.  The
-values in these files are in decimal; if hexadecimal is more appropriate,
-the following functions can be used instead:
-
-    void debugfs_create_x8(const char *name, umode_t mode,
-			   struct dentry *parent, u8 *value);
-    void debugfs_create_x16(const char *name, umode_t mode,
-			    struct dentry *parent, u16 *value);
-    void debugfs_create_x32(const char *name, umode_t mode,
-			    struct dentry *parent, u32 *value);
-    void debugfs_create_x64(const char *name, umode_t mode,
-			    struct dentry *parent, u64 *value);
-
-These functions are useful as long as the developer knows the size of the
-value to be exported.  Some types can have different widths on different
-architectures, though, complicating the situation somewhat.  There are
-functions meant to help out in such special cases:
-
-    void debugfs_create_size_t(const char *name, umode_t mode,
-			       struct dentry *parent, size_t *value);
-
-As might be expected, this function will create a debugfs file to represent
-a variable of type size_t.
-
-Similarly, there are helpers for variables of type unsigned long, in decimal
-and hexadecimal:
-
-    struct dentry *debugfs_create_ulong(const char *name, umode_t mode,
-					struct dentry *parent,
-					unsigned long *value);
-    void debugfs_create_xul(const char *name, umode_t mode,
-			    struct dentry *parent, unsigned long *value);
-
-Boolean values can be placed in debugfs with:
-
-    struct dentry *debugfs_create_bool(const char *name, umode_t mode,
-				       struct dentry *parent, bool *value);
-
-A read on the resulting file will yield either Y (for non-zero values) or
-N, followed by a newline.  If written to, it will accept either upper- or
-lower-case values, or 1 or 0.  Any other input will be silently ignored.
-
-Also, atomic_t values can be placed in debugfs with:
-
-    void debugfs_create_atomic_t(const char *name, umode_t mode,
-				 struct dentry *parent, atomic_t *value)
-
-A read of this file will get atomic_t values, and a write of this file
-will set atomic_t values.
-
-Another option is exporting a block of arbitrary binary data, with
-this structure and function:
-
-    struct debugfs_blob_wrapper {
-	void *data;
-	unsigned long size;
-    };
-
-    struct dentry *debugfs_create_blob(const char *name, umode_t mode,
-				       struct dentry *parent,
-				       struct debugfs_blob_wrapper *blob);
-
-A read of this file will return the data pointed to by the
-debugfs_blob_wrapper structure.  Some drivers use "blobs" as a simple way
-to return several lines of (static) formatted text output.  This function
-can be used to export binary information, but there does not appear to be
-any code which does so in the mainline.  Note that all files created with
-debugfs_create_blob() are read-only.
-
-If you want to dump a block of registers (something that happens quite
-often during development, even if little such code reaches mainline.
-Debugfs offers two functions: one to make a registers-only file, and
-another to insert a register block in the middle of another sequential
-file.
-
-    struct debugfs_reg32 {
-	char *name;
-	unsigned long offset;
-    };
-
-    struct debugfs_regset32 {
-	struct debugfs_reg32 *regs;
-	int nregs;
-	void __iomem *base;
-    };
-
-    struct dentry *debugfs_create_regset32(const char *name, umode_t mode,
-				     struct dentry *parent,
-				     struct debugfs_regset32 *regset);
-
-    void debugfs_print_regs32(struct seq_file *s, struct debugfs_reg32 *regs,
-			 int nregs, void __iomem *base, char *prefix);
-
-The "base" argument may be 0, but you may want to build the reg32 array
-using __stringify, and a number of register names (macros) are actually
-byte offsets over a base for the register block.
-
-If you want to dump an u32 array in debugfs, you can create file with:
-
-    void debugfs_create_u32_array(const char *name, umode_t mode,
-			struct dentry *parent,
-			u32 *array, u32 elements);
-
-The "array" argument provides data, and the "elements" argument is
-the number of elements in the array. Note: Once array is created its
-size can not be changed.
-
-There is a helper function to create device related seq_file:
-
-   struct dentry *debugfs_create_devm_seqfile(struct device *dev,
-				const char *name,
-				struct dentry *parent,
-				int (*read_fn)(struct seq_file *s,
-					void *data));
-
-The "dev" argument is the device related to this debugfs file, and
-the "read_fn" is a function pointer which to be called to print the
-seq_file content.
-
-There are a couple of other directory-oriented helper functions:
-
-    struct dentry *debugfs_rename(struct dentry *old_dir, 
-    				  struct dentry *old_dentry,
-		                  struct dentry *new_dir, 
-				  const char *new_name);
-
-    struct dentry *debugfs_create_symlink(const char *name, 
-                                          struct dentry *parent,
-				      	  const char *target);
-
-A call to debugfs_rename() will give a new name to an existing debugfs
-file, possibly in a different directory.  The new_name must not exist prior
-to the call; the return value is old_dentry with updated information.
-Symbolic links can be created with debugfs_create_symlink().
-
-There is one important thing that all debugfs users must take into account:
-there is no automatic cleanup of any directories created in debugfs.  If a
-module is unloaded without explicitly removing debugfs entries, the result
-will be a lot of stale pointers and no end of highly antisocial behavior.
-So all debugfs users - at least those which can be built as modules - must
-be prepared to remove all files and directories they create there.  A file
-can be removed with:
-
-    void debugfs_remove(struct dentry *dentry);
-
-The dentry value can be NULL or an error value, in which case nothing will
-be removed.
-
-Once upon a time, debugfs users were required to remember the dentry
-pointer for every debugfs file they created so that all files could be
-cleaned up.  We live in more civilized times now, though, and debugfs users
-can call:
-
-    void debugfs_remove_recursive(struct dentry *dentry);
-
-If this function is passed a pointer for the dentry corresponding to the
-top-level directory, the entire hierarchy below that directory will be
-removed.
-
-Notes:
-	[1] http://lwn.net/Articles/309298/
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 8fe848ea04af..ab3b656bbe60 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -57,6 +57,7 @@ Documentation for filesystem implementations.
    btrfs
    ceph
    cramfs
+   debugfs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 14a19fa5cf759ea18bc7d692cd8fe326af3c4d0a Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:58 +0100
Subject: docs: filesystems: convert dlmfs.txt to ReST

- Add a SPDX header;
- Use copyright symbol;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/efc9e59925723e17d1a4741b11049616c221463e.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/dlmfs.rst | 140 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/dlmfs.txt | 130 ---------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 141 insertions(+), 130 deletions(-)
 create mode 100644 Documentation/filesystems/dlmfs.rst
 delete mode 100644 Documentation/filesystems/dlmfs.txt

diff --git a/Documentation/filesystems/dlmfs.rst b/Documentation/filesystems/dlmfs.rst
new file mode 100644
index 000000000000..68daaa7facf9
--- /dev/null
+++ b/Documentation/filesystems/dlmfs.rst
@@ -0,0 +1,140 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+=====
+DLMFS
+=====
+
+A minimal DLM userspace interface implemented via a virtual file
+system.
+
+dlmfs is built with OCFS2 as it requires most of its infrastructure.
+
+:Project web page:    http://ocfs2.wiki.kernel.org
+:Tools web page:      https://github.com/markfasheh/ocfs2-tools
+:OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
+
+All code copyright 2005 Oracle except when otherwise noted.
+
+Credits
+=======
+
+Some code taken from ramfs which is Copyright |copy| 2000 Linus Torvalds
+and Transmeta Corp.
+
+Mark Fasheh <mark.fasheh@oracle.com>
+
+Caveats
+=======
+- Right now it only works with the OCFS2 DLM, though support for other
+  DLM implementations should not be a major issue.
+
+Mount options
+=============
+None
+
+Usage
+=====
+
+If you're just interested in OCFS2, then please see ocfs2.txt. The
+rest of this document will be geared towards those who want to use
+dlmfs for easy to setup and easy to use clustered locking in
+userspace.
+
+Setup
+=====
+
+dlmfs requires that the OCFS2 cluster infrastructure be in
+place. Please download ocfs2-tools from the above url and configure a
+cluster.
+
+You'll want to start heartbeating on a volume which all the nodes in
+your lockspace can access. The easiest way to do this is via
+ocfs2_hb_ctl (distributed with ocfs2-tools). Right now it requires
+that an OCFS2 file system be in place so that it can automatically
+find its heartbeat area, though it will eventually support heartbeat
+against raw disks.
+
+Please see the ocfs2_hb_ctl and mkfs.ocfs2 manual pages distributed
+with ocfs2-tools.
+
+Once you're heartbeating, DLM lock 'domains' can be easily created /
+destroyed and locks within them accessed.
+
+Locking
+=======
+
+Users may access dlmfs via standard file system calls, or they can use
+'libo2dlm' (distributed with ocfs2-tools) which abstracts the file
+system calls and presents a more traditional locking api.
+
+dlmfs handles lock caching automatically for the user, so a lock
+request for an already acquired lock will not generate another DLM
+call. Userspace programs are assumed to handle their own local
+locking.
+
+Two levels of locks are supported - Shared Read, and Exclusive.
+Also supported is a Trylock operation.
+
+For information on the libo2dlm interface, please see o2dlm.h,
+distributed with ocfs2-tools.
+
+Lock value blocks can be read and written to a resource via read(2)
+and write(2) against the fd obtained via your open(2) call. The
+maximum currently supported LVB length is 64 bytes (though that is an
+OCFS2 DLM limitation). Through this mechanism, users of dlmfs can share
+small amounts of data amongst their nodes.
+
+mkdir(2) signals dlmfs to join a domain (which will have the same name
+as the resulting directory)
+
+rmdir(2) signals dlmfs to leave the domain
+
+Locks for a given domain are represented by regular inodes inside the
+domain directory.  Locking against them is done via the open(2) system
+call.
+
+The open(2) call will not return until your lock has been granted or
+an error has occurred, unless it has been instructed to do a trylock
+operation. If the lock succeeds, you'll get an fd.
+
+open(2) with O_CREAT to ensure the resource inode is created - dlmfs does
+not automatically create inodes for existing lock resources.
+
+============  ===========================
+Open Flag     Lock Request Type
+============  ===========================
+O_RDONLY      Shared Read
+O_RDWR        Exclusive
+============  ===========================
+
+
+============  ===========================
+Open Flag     Resulting Locking Behavior
+============  ===========================
+O_NONBLOCK    Trylock operation
+============  ===========================
+
+You must provide exactly one of O_RDONLY or O_RDWR.
+
+If O_NONBLOCK is also provided and the trylock operation was valid but
+could not lock the resource then open(2) will return ETXTBUSY.
+
+close(2) drops the lock associated with your fd.
+
+Modes passed to mkdir(2) or open(2) are adhered to locally. Chown is
+supported locally as well. This means you can use them to restrict
+access to the resources via dlmfs on your local node only.
+
+The resource LVB may be read from the fd in either Shared Read or
+Exclusive modes via the read(2) system call. It can be written via
+write(2) only when open in Exclusive mode.
+
+Once written, an LVB will be visible to other nodes who obtain Read
+Only or higher level locks on the resource.
+
+See Also
+========
+http://opendlm.sourceforge.net/cvsmirror/opendlm/docs/dlmbook_final.pdf
+
+For more information on the VMS distributed locking API.
diff --git a/Documentation/filesystems/dlmfs.txt b/Documentation/filesystems/dlmfs.txt
deleted file mode 100644
index fcf4d509d118..000000000000
--- a/Documentation/filesystems/dlmfs.txt
+++ /dev/null
@@ -1,130 +0,0 @@
-dlmfs
-==================
-A minimal DLM userspace interface implemented via a virtual file
-system.
-
-dlmfs is built with OCFS2 as it requires most of its infrastructure.
-
-Project web page:    http://ocfs2.wiki.kernel.org
-Tools web page:      https://github.com/markfasheh/ocfs2-tools
-OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
-
-All code copyright 2005 Oracle except when otherwise noted.
-
-CREDITS
-=======
-
-Some code taken from ramfs which is Copyright (C) 2000 Linus Torvalds
-and Transmeta Corp.
-
-Mark Fasheh <mark.fasheh@oracle.com>
-
-Caveats
-=======
-- Right now it only works with the OCFS2 DLM, though support for other
-  DLM implementations should not be a major issue.
-
-Mount options
-=============
-None
-
-Usage
-=====
-
-If you're just interested in OCFS2, then please see ocfs2.txt. The
-rest of this document will be geared towards those who want to use
-dlmfs for easy to setup and easy to use clustered locking in
-userspace.
-
-Setup
-=====
-
-dlmfs requires that the OCFS2 cluster infrastructure be in
-place. Please download ocfs2-tools from the above url and configure a
-cluster.
-
-You'll want to start heartbeating on a volume which all the nodes in
-your lockspace can access. The easiest way to do this is via
-ocfs2_hb_ctl (distributed with ocfs2-tools). Right now it requires
-that an OCFS2 file system be in place so that it can automatically
-find its heartbeat area, though it will eventually support heartbeat
-against raw disks.
-
-Please see the ocfs2_hb_ctl and mkfs.ocfs2 manual pages distributed
-with ocfs2-tools.
-
-Once you're heartbeating, DLM lock 'domains' can be easily created /
-destroyed and locks within them accessed.
-
-Locking
-=======
-
-Users may access dlmfs via standard file system calls, or they can use
-'libo2dlm' (distributed with ocfs2-tools) which abstracts the file
-system calls and presents a more traditional locking api.
-
-dlmfs handles lock caching automatically for the user, so a lock
-request for an already acquired lock will not generate another DLM
-call. Userspace programs are assumed to handle their own local
-locking.
-
-Two levels of locks are supported - Shared Read, and Exclusive.
-Also supported is a Trylock operation.
-
-For information on the libo2dlm interface, please see o2dlm.h,
-distributed with ocfs2-tools.
-
-Lock value blocks can be read and written to a resource via read(2)
-and write(2) against the fd obtained via your open(2) call. The
-maximum currently supported LVB length is 64 bytes (though that is an
-OCFS2 DLM limitation). Through this mechanism, users of dlmfs can share
-small amounts of data amongst their nodes.
-
-mkdir(2) signals dlmfs to join a domain (which will have the same name
-as the resulting directory)
-
-rmdir(2) signals dlmfs to leave the domain
-
-Locks for a given domain are represented by regular inodes inside the
-domain directory.  Locking against them is done via the open(2) system
-call.
-
-The open(2) call will not return until your lock has been granted or
-an error has occurred, unless it has been instructed to do a trylock
-operation. If the lock succeeds, you'll get an fd.
-
-open(2) with O_CREAT to ensure the resource inode is created - dlmfs does
-not automatically create inodes for existing lock resources.
-
-Open Flag     Lock Request Type
----------     -----------------
-O_RDONLY      Shared Read
-O_RDWR        Exclusive
-
-Open Flag     Resulting Locking Behavior
----------     --------------------------
-O_NONBLOCK    Trylock operation
-
-You must provide exactly one of O_RDONLY or O_RDWR.
-
-If O_NONBLOCK is also provided and the trylock operation was valid but
-could not lock the resource then open(2) will return ETXTBUSY.
-
-close(2) drops the lock associated with your fd.
-
-Modes passed to mkdir(2) or open(2) are adhered to locally. Chown is
-supported locally as well. This means you can use them to restrict
-access to the resources via dlmfs on your local node only.
-
-The resource LVB may be read from the fd in either Shared Read or
-Exclusive modes via the read(2) system call. It can be written via
-write(2) only when open in Exclusive mode.
-
-Once written, an LVB will be visible to other nodes who obtain Read
-Only or higher level locks on the resource.
-
-See Also
-========
-http://opendlm.sourceforge.net/cvsmirror/opendlm/docs/dlmbook_final.pdf
-
-For more information on the VMS distributed locking API.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index ab3b656bbe60..c6885c7ef781 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -58,6 +58,7 @@ Documentation for filesystem implementations.
    ceph
    cramfs
    debugfs
+   dlmfs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From b02a17cb8ae23479c9bf306e96d2dd71422de63f Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:11:59 +0100
Subject: docs: filesystems: convert ecryptfs.txt to ReST

- Add a SPDX header;
- Add a document title;
- use :field: markup;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Tyler Hicks <code@tyhicks.com>
Link: https://lore.kernel.org/r/6e13841ebd00c8d988027115c75c58821bb41a0c.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/ecryptfs.rst | 87 ++++++++++++++++++++++++++++++++++
 Documentation/filesystems/ecryptfs.txt | 77 ------------------------------
 Documentation/filesystems/index.rst    |  1 +
 3 files changed, 88 insertions(+), 77 deletions(-)
 create mode 100644 Documentation/filesystems/ecryptfs.rst
 delete mode 100644 Documentation/filesystems/ecryptfs.txt

diff --git a/Documentation/filesystems/ecryptfs.rst b/Documentation/filesystems/ecryptfs.rst
new file mode 100644
index 000000000000..7236172300ef
--- /dev/null
+++ b/Documentation/filesystems/ecryptfs.rst
@@ -0,0 +1,87 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================================
+eCryptfs: A stacked cryptographic filesystem for Linux
+======================================================
+
+eCryptfs is free software. Please see the file COPYING for details.
+For documentation, please see the files in the doc/ subdirectory.  For
+building and installation instructions please see the INSTALL file.
+
+:Maintainer: Phillip Hellewell
+:Lead developer: Michael A. Halcrow <mhalcrow@us.ibm.com>
+:Developers: Michael C. Thompson
+             Kent Yoder
+:Web Site: http://ecryptfs.sf.net
+
+This software is currently undergoing development. Make sure to
+maintain a backup copy of any data you write into eCryptfs.
+
+eCryptfs requires the userspace tools downloadable from the
+SourceForge site:
+
+http://sourceforge.net/projects/ecryptfs/
+
+Userspace requirements include:
+
+- David Howells' userspace keyring headers and libraries (version
+  1.0 or higher), obtainable from
+  http://people.redhat.com/~dhowells/keyutils/
+- Libgcrypt
+
+
+Notes
+=====
+
+In the beta/experimental releases of eCryptfs, when you upgrade
+eCryptfs, you should copy the files to an unencrypted location and
+then copy the files back into the new eCryptfs mount to migrate the
+files.
+
+
+Mount-wide Passphrase
+=====================
+
+Create a new directory into which eCryptfs will write its encrypted
+files (i.e., /root/crypt).  Then, create the mount point directory
+(i.e., /mnt/crypt).  Now it's time to mount eCryptfs::
+
+    mount -t ecryptfs /root/crypt /mnt/crypt
+
+You should be prompted for a passphrase and a salt (the salt may be
+blank).
+
+Try writing a new file::
+
+    echo "Hello, World" > /mnt/crypt/hello.txt
+
+The operation will complete.  Notice that there is a new file in
+/root/crypt that is at least 12288 bytes in size (depending on your
+host page size).  This is the encrypted underlying file for what you
+just wrote.  To test reading, from start to finish, you need to clear
+the user session keyring:
+
+keyctl clear @u
+
+Then umount /mnt/crypt and mount again per the instructions given
+above.
+
+::
+
+    cat /mnt/crypt/hello.txt
+
+
+Notes
+=====
+
+eCryptfs version 0.1 should only be mounted on (1) empty directories
+or (2) directories containing files only created by eCryptfs. If you
+mount a directory that has pre-existing files not created by eCryptfs,
+then behavior is undefined. Do not run eCryptfs in higher verbosity
+levels unless you are doing so for the sole purpose of debugging or
+development, since secret values will be written out to the system log
+in that case.
+
+
+Mike Halcrow
+mhalcrow@us.ibm.com
diff --git a/Documentation/filesystems/ecryptfs.txt b/Documentation/filesystems/ecryptfs.txt
deleted file mode 100644
index 01d8a08351ac..000000000000
--- a/Documentation/filesystems/ecryptfs.txt
+++ /dev/null
@@ -1,77 +0,0 @@
-eCryptfs: A stacked cryptographic filesystem for Linux
-
-eCryptfs is free software. Please see the file COPYING for details.
-For documentation, please see the files in the doc/ subdirectory.  For
-building and installation instructions please see the INSTALL file.
-
-Maintainer: Phillip Hellewell
-Lead developer: Michael A. Halcrow <mhalcrow@us.ibm.com>
-Developers: Michael C. Thompson
-            Kent Yoder
-Web Site: http://ecryptfs.sf.net
-
-This software is currently undergoing development. Make sure to
-maintain a backup copy of any data you write into eCryptfs.
-
-eCryptfs requires the userspace tools downloadable from the
-SourceForge site:
-
-http://sourceforge.net/projects/ecryptfs/
-
-Userspace requirements include:
- - David Howells' userspace keyring headers and libraries (version
-   1.0 or higher), obtainable from
-   http://people.redhat.com/~dhowells/keyutils/
- - Libgcrypt
-
-
-NOTES
-
-In the beta/experimental releases of eCryptfs, when you upgrade
-eCryptfs, you should copy the files to an unencrypted location and
-then copy the files back into the new eCryptfs mount to migrate the
-files.
-
-
-MOUNT-WIDE PASSPHRASE
-
-Create a new directory into which eCryptfs will write its encrypted
-files (i.e., /root/crypt).  Then, create the mount point directory
-(i.e., /mnt/crypt).  Now it's time to mount eCryptfs:
-
-mount -t ecryptfs /root/crypt /mnt/crypt
-
-You should be prompted for a passphrase and a salt (the salt may be
-blank).
-
-Try writing a new file:
-
-echo "Hello, World" > /mnt/crypt/hello.txt
-
-The operation will complete.  Notice that there is a new file in
-/root/crypt that is at least 12288 bytes in size (depending on your
-host page size).  This is the encrypted underlying file for what you
-just wrote.  To test reading, from start to finish, you need to clear
-the user session keyring:
-
-keyctl clear @u
-
-Then umount /mnt/crypt and mount again per the instructions given
-above.
-
-cat /mnt/crypt/hello.txt
-
-
-NOTES
-
-eCryptfs version 0.1 should only be mounted on (1) empty directories
-or (2) directories containing files only created by eCryptfs. If you
-mount a directory that has pre-existing files not created by eCryptfs,
-then behavior is undefined. Do not run eCryptfs in higher verbosity
-levels unless you are doing so for the sole purpose of debugging or
-development, since secret values will be written out to the system log
-in that case.
-
-
-Mike Halcrow
-mhalcrow@us.ibm.com
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index c6885c7ef781..d6d69f1c9287 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -59,6 +59,7 @@ Documentation for filesystem implementations.
    cramfs
    debugfs
    dlmfs
+   ecryptfs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 06dedb45b79c6550b878244879f33b6e614126bd Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:00 +0100
Subject: docs: filesystems: convert efivarfs.txt to ReST

Trivial changes:

- Add a SPDX header;
- Adjust document title;
- Mark a literal block as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/215691d747055c4ccb038ec7d78d8d1fe87fe2c0.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/efivarfs.rst | 26 ++++++++++++++++++++++++++
 Documentation/filesystems/efivarfs.txt | 23 -----------------------
 Documentation/filesystems/index.rst    |  1 +
 3 files changed, 27 insertions(+), 23 deletions(-)
 create mode 100644 Documentation/filesystems/efivarfs.rst
 delete mode 100644 Documentation/filesystems/efivarfs.txt

diff --git a/Documentation/filesystems/efivarfs.rst b/Documentation/filesystems/efivarfs.rst
new file mode 100644
index 000000000000..90ac65683e7e
--- /dev/null
+++ b/Documentation/filesystems/efivarfs.rst
@@ -0,0 +1,26 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================================
+efivarfs - a (U)EFI variable filesystem
+=======================================
+
+The efivarfs filesystem was created to address the shortcomings of
+using entries in sysfs to maintain EFI variables. The old sysfs EFI
+variables code only supported variables of up to 1024 bytes. This
+limitation existed in version 0.99 of the EFI specification, but was
+removed before any full releases. Since variables can now be larger
+than a single page, sysfs isn't the best interface for this.
+
+Variables can be created, deleted and modified with the efivarfs
+filesystem.
+
+efivarfs is typically mounted like this::
+
+	mount -t efivarfs none /sys/firmware/efi/efivars
+
+Due to the presence of numerous firmware bugs where removing non-standard
+UEFI variables causes the system firmware to fail to POST, efivarfs
+files that are not well-known standardized variables are created
+as immutable files.  This doesn't prevent removal - "chattr -i" will work -
+but it does prevent this kind of failure from being accomplished
+accidentally.
diff --git a/Documentation/filesystems/efivarfs.txt b/Documentation/filesystems/efivarfs.txt
deleted file mode 100644
index 686a64bba775..000000000000
--- a/Documentation/filesystems/efivarfs.txt
+++ /dev/null
@@ -1,23 +0,0 @@
-
-efivarfs - a (U)EFI variable filesystem
-
-The efivarfs filesystem was created to address the shortcomings of
-using entries in sysfs to maintain EFI variables. The old sysfs EFI
-variables code only supported variables of up to 1024 bytes. This
-limitation existed in version 0.99 of the EFI specification, but was
-removed before any full releases. Since variables can now be larger
-than a single page, sysfs isn't the best interface for this.
-
-Variables can be created, deleted and modified with the efivarfs
-filesystem.
-
-efivarfs is typically mounted like this,
-
-	mount -t efivarfs none /sys/firmware/efi/efivars
-
-Due to the presence of numerous firmware bugs where removing non-standard
-UEFI variables causes the system firmware to fail to POST, efivarfs
-files that are not well-known standardized variables are created
-as immutable files.  This doesn't prevent removal - "chattr -i" will work -
-but it does prevent this kind of failure from being accomplished
-accidentally.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index d6d69f1c9287..4230f49d2732 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -60,6 +60,7 @@ Documentation for filesystem implementations.
    debugfs
    dlmfs
    ecryptfs
+   efivarfs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From e66d8631ddb3306bd9f463324c2d9a5d9dc559f7 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:01 +0100
Subject: docs: filesystems: convert erofs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add lists markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/402d1d2f7252b8a683f7a9c6867bc5428da64026.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/erofs.rst | 240 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/erofs.txt | 211 -------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 241 insertions(+), 211 deletions(-)
 create mode 100644 Documentation/filesystems/erofs.rst
 delete mode 100644 Documentation/filesystems/erofs.txt

diff --git a/Documentation/filesystems/erofs.rst b/Documentation/filesystems/erofs.rst
new file mode 100644
index 000000000000..bf145171c2bf
--- /dev/null
+++ b/Documentation/filesystems/erofs.rst
@@ -0,0 +1,240 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================
+Enhanced Read-Only File System - EROFS
+======================================
+
+Overview
+========
+
+EROFS file-system stands for Enhanced Read-Only File System. Different
+from other read-only file systems, it aims to be designed for flexibility,
+scalability, but be kept simple and high performance.
+
+It is designed as a better filesystem solution for the following scenarios:
+
+ - read-only storage media or
+
+ - part of a fully trusted read-only solution, which means it needs to be
+   immutable and bit-for-bit identical to the official golden image for
+   their releases due to security and other considerations and
+
+ - hope to save some extra storage space with guaranteed end-to-end performance
+   by using reduced metadata and transparent file compression, especially
+   for those embedded devices with limited memory (ex, smartphone);
+
+Here is the main features of EROFS:
+
+ - Little endian on-disk design;
+
+ - Currently 4KB block size (nobh) and therefore maximum 16TB address space;
+
+ - Metadata & data could be mixed by design;
+
+ - 2 inode versions for different requirements:
+
+   =====================  ============  =====================================
+                          compact (v1)  extended (v2)
+   =====================  ============  =====================================
+   Inode metadata size    32 bytes      64 bytes
+   Max file size          4 GB          16 EB (also limited by max. vol size)
+   Max uids/gids          65536         4294967296
+   File change time       no            yes (64 + 32-bit timestamp)
+   Max hardlinks          65536         4294967296
+   Metadata reserved      4 bytes       14 bytes
+   =====================  ============  =====================================
+
+ - Support extended attributes (xattrs) as an option;
+
+ - Support xattr inline and tail-end data inline for all files;
+
+ - Support POSIX.1e ACLs by using xattrs;
+
+ - Support transparent file compression as an option:
+   LZ4 algorithm with 4 KB fixed-sized output compression for high performance.
+
+The following git tree provides the file system user-space tools under
+development (ex, formatting tool mkfs.erofs):
+
+- git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git
+
+Bugs and patches are welcome, please kindly help us and send to the following
+linux-erofs mailing list:
+
+- linux-erofs mailing list   <linux-erofs@lists.ozlabs.org>
+
+Mount options
+=============
+
+===================    =========================================================
+(no)user_xattr         Setup Extended User Attributes. Note: xattr is enabled
+                       by default if CONFIG_EROFS_FS_XATTR is selected.
+(no)acl                Setup POSIX Access Control List. Note: acl is enabled
+                       by default if CONFIG_EROFS_FS_POSIX_ACL is selected.
+cache_strategy=%s      Select a strategy for cached decompression from now on:
+
+		       ==========  =============================================
+                         disabled  In-place I/O decompression only;
+                        readahead  Cache the last incomplete compressed physical
+                                   cluster for further reading. It still does
+                                   in-place I/O decompression for the rest
+                                   compressed physical clusters;
+                       readaround  Cache the both ends of incomplete compressed
+                                   physical clusters for further reading.
+                                   It still does in-place I/O decompression
+                                   for the rest compressed physical clusters.
+		       ==========  =============================================
+===================    =========================================================
+
+On-disk details
+===============
+
+Summary
+-------
+Different from other read-only file systems, an EROFS volume is designed
+to be as simple as possible::
+
+                                |-> aligned with the block size
+   ____________________________________________________________
+  | |SB| | ... | Metadata | ... | Data | Metadata | ... | Data |
+  |_|__|_|_____|__________|_____|______|__________|_____|______|
+  0 +1K
+
+All data areas should be aligned with the block size, but metadata areas
+may not. All metadatas can be now observed in two different spaces (views):
+
+ 1. Inode metadata space
+
+    Each valid inode should be aligned with an inode slot, which is a fixed
+    value (32 bytes) and designed to be kept in line with compact inode size.
+
+    Each inode can be directly found with the following formula:
+         inode offset = meta_blkaddr * block_size + 32 * nid
+
+    ::
+
+				    |-> aligned with 8B
+					    |-> followed closely
+	+ meta_blkaddr blocks                                      |-> another slot
+	_____________________________________________________________________
+	|  ...   | inode |  xattrs  | extents  | data inline | ... | inode ...
+	|________|_______|(optional)|(optional)|__(optional)_|_____|__________
+		|-> aligned with the inode slot size
+		    .                   .
+		    .                         .
+		.                              .
+		.                                    .
+	    .                                         .
+	    .                                              .
+	.____________________________________________________|-> aligned with 4B
+	| xattr_ibody_header | shared xattrs | inline xattrs |
+	|____________________|_______________|_______________|
+	|->    12 bytes    <-|->x * 4 bytes<-|               .
+			    .                .                 .
+			.                      .                   .
+		.                           .                     .
+	    ._______________________________.______________________.
+	    | id | id | id | id |  ... | id | ent | ... | ent| ... |
+	    |____|____|____|____|______|____|_____|_____|____|_____|
+					    |-> aligned with 4B
+							|-> aligned with 4B
+
+    Inode could be 32 or 64 bytes, which can be distinguished from a common
+    field which all inode versions have -- i_format::
+
+        __________________               __________________
+       |     i_format     |             |     i_format     |
+       |__________________|             |__________________|
+       |        ...       |             |        ...       |
+       |                  |             |                  |
+       |__________________| 32 bytes    |                  |
+                                        |                  |
+                                        |__________________| 64 bytes
+
+    Xattrs, extents, data inline are followed by the corresponding inode with
+    proper alignment, and they could be optional for different data mappings.
+    _currently_ total 4 valid data mappings are supported:
+
+    ==  ====================================================================
+     0  flat file data without data inline (no extent);
+     1  fixed-sized output data compression (with non-compacted indexes);
+     2  flat file data with tail packing data inline (no extent);
+     3  fixed-sized output data compression (with compacted indexes, v5.3+).
+    ==  ====================================================================
+
+    The size of the optional xattrs is indicated by i_xattr_count in inode
+    header. Large xattrs or xattrs shared by many different files can be
+    stored in shared xattrs metadata rather than inlined right after inode.
+
+ 2. Shared xattrs metadata space
+
+    Shared xattrs space is similar to the above inode space, started with
+    a specific block indicated by xattr_blkaddr, organized one by one with
+    proper align.
+
+    Each share xattr can also be directly found by the following formula:
+         xattr offset = xattr_blkaddr * block_size + 4 * xattr_id
+
+    ::
+
+			    |-> aligned by  4 bytes
+	+ xattr_blkaddr blocks                     |-> aligned with 4 bytes
+	_________________________________________________________________________
+	|  ...   | xattr_entry |  xattr data | ... |  xattr_entry | xattr data  ...
+	|________|_____________|_____________|_____|______________|_______________
+
+Directories
+-----------
+All directories are now organized in a compact on-disk format. Note that
+each directory block is divided into index and name areas in order to support
+random file lookup, and all directory entries are _strictly_ recorded in
+alphabetical order in order to support improved prefix binary search
+algorithm (could refer to the related source code).
+
+::
+
+		    ___________________________
+		    /                           |
+		/              ______________|________________
+		/              /              | nameoff1       | nameoffN-1
+    ____________.______________._______________v________________v__________
+    | dirent | dirent | ... | dirent | filename | filename | ... | filename |
+    |___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____|
+	\                           ^
+	\                          |                           * could have
+	\                         |                             trailing '\0'
+	    \________________________| nameoff0
+
+				Directory block
+
+Note that apart from the offset of the first filename, nameoff0 also indicates
+the total number of directory entries in this block since it is no need to
+introduce another on-disk field at all.
+
+Compression
+-----------
+Currently, EROFS supports 4KB fixed-sized output transparent file compression,
+as illustrated below::
+
+	    |---- Variant-Length Extent ----|-------- VLE --------|----- VLE -----
+	    clusterofs                      clusterofs            clusterofs
+	    |                               |                     |   logical data
+    _________v_______________________________v_____________________v_______________
+    ... |    .        |             |        .    |             |  .          | ...
+    ____|____.________|_____________|________.____|_____________|__.__________|____
+	|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|
+	    size          size          size          size          size
+	    .                             .                .                   .
+	    .                       .               .                  .
+		.                  .              .                .
+	_______._____________._____________._____________._____________________
+	    ... |             |             |             | ... physical data
+	_______|_____________|_____________|_____________|_____________________
+		|-> cluster <-|-> cluster <-|-> cluster <-|
+		    size          size          size
+
+Currently each on-disk physical cluster can contain 4KB (un)compressed data
+at most. For each logical cluster, there is a corresponding on-disk index to
+describe its cluster type, physical cluster address, etc.
+
+See "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details.
diff --git a/Documentation/filesystems/erofs.txt b/Documentation/filesystems/erofs.txt
deleted file mode 100644
index db6d39c3ae71..000000000000
--- a/Documentation/filesystems/erofs.txt
+++ /dev/null
@@ -1,211 +0,0 @@
-Overview
-========
-
-EROFS file-system stands for Enhanced Read-Only File System. Different
-from other read-only file systems, it aims to be designed for flexibility,
-scalability, but be kept simple and high performance.
-
-It is designed as a better filesystem solution for the following scenarios:
- - read-only storage media or
-
- - part of a fully trusted read-only solution, which means it needs to be
-   immutable and bit-for-bit identical to the official golden image for
-   their releases due to security and other considerations and
-
- - hope to save some extra storage space with guaranteed end-to-end performance
-   by using reduced metadata and transparent file compression, especially
-   for those embedded devices with limited memory (ex, smartphone);
-
-Here is the main features of EROFS:
- - Little endian on-disk design;
-
- - Currently 4KB block size (nobh) and therefore maximum 16TB address space;
-
- - Metadata & data could be mixed by design;
-
- - 2 inode versions for different requirements:
-                          compact (v1)  extended (v2)
-   Inode metadata size:   32 bytes      64 bytes
-   Max file size:         4 GB          16 EB (also limited by max. vol size)
-   Max uids/gids:         65536         4294967296
-   File change time:      no            yes (64 + 32-bit timestamp)
-   Max hardlinks:         65536         4294967296
-   Metadata reserved:     4 bytes       14 bytes
-
- - Support extended attributes (xattrs) as an option;
-
- - Support xattr inline and tail-end data inline for all files;
-
- - Support POSIX.1e ACLs by using xattrs;
-
- - Support transparent file compression as an option:
-   LZ4 algorithm with 4 KB fixed-sized output compression for high performance.
-
-The following git tree provides the file system user-space tools under
-development (ex, formatting tool mkfs.erofs):
->> git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git
-
-Bugs and patches are welcome, please kindly help us and send to the following
-linux-erofs mailing list:
->> linux-erofs mailing list   <linux-erofs@lists.ozlabs.org>
-
-Mount options
-=============
-
-(no)user_xattr         Setup Extended User Attributes. Note: xattr is enabled
-                       by default if CONFIG_EROFS_FS_XATTR is selected.
-(no)acl                Setup POSIX Access Control List. Note: acl is enabled
-                       by default if CONFIG_EROFS_FS_POSIX_ACL is selected.
-cache_strategy=%s      Select a strategy for cached decompression from now on:
-                         disabled: In-place I/O decompression only;
-                        readahead: Cache the last incomplete compressed physical
-                                   cluster for further reading. It still does
-                                   in-place I/O decompression for the rest
-                                   compressed physical clusters;
-                       readaround: Cache the both ends of incomplete compressed
-                                   physical clusters for further reading.
-                                   It still does in-place I/O decompression
-                                   for the rest compressed physical clusters.
-
-On-disk details
-===============
-
-Summary
--------
-Different from other read-only file systems, an EROFS volume is designed
-to be as simple as possible:
-
-                                |-> aligned with the block size
-   ____________________________________________________________
-  | |SB| | ... | Metadata | ... | Data | Metadata | ... | Data |
-  |_|__|_|_____|__________|_____|______|__________|_____|______|
-  0 +1K
-
-All data areas should be aligned with the block size, but metadata areas
-may not. All metadatas can be now observed in two different spaces (views):
- 1. Inode metadata space
-    Each valid inode should be aligned with an inode slot, which is a fixed
-    value (32 bytes) and designed to be kept in line with compact inode size.
-
-    Each inode can be directly found with the following formula:
-         inode offset = meta_blkaddr * block_size + 32 * nid
-
-                                |-> aligned with 8B
-                                           |-> followed closely
-    + meta_blkaddr blocks                                      |-> another slot
-     _____________________________________________________________________
-    |  ...   | inode |  xattrs  | extents  | data inline | ... | inode ...
-    |________|_______|(optional)|(optional)|__(optional)_|_____|__________
-             |-> aligned with the inode slot size
-                  .                   .
-                .                         .
-              .                              .
-            .                                    .
-          .                                         .
-        .                                              .
-      .____________________________________________________|-> aligned with 4B
-      | xattr_ibody_header | shared xattrs | inline xattrs |
-      |____________________|_______________|_______________|
-      |->    12 bytes    <-|->x * 4 bytes<-|               .
-                          .                .                 .
-                    .                      .                   .
-               .                           .                     .
-           ._______________________________.______________________.
-           | id | id | id | id |  ... | id | ent | ... | ent| ... |
-           |____|____|____|____|______|____|_____|_____|____|_____|
-                                           |-> aligned with 4B
-                                                       |-> aligned with 4B
-
-    Inode could be 32 or 64 bytes, which can be distinguished from a common
-    field which all inode versions have -- i_format:
-
-        __________________               __________________
-       |     i_format     |             |     i_format     |
-       |__________________|             |__________________|
-       |        ...       |             |        ...       |
-       |                  |             |                  |
-       |__________________| 32 bytes    |                  |
-                                        |                  |
-                                        |__________________| 64 bytes
-
-    Xattrs, extents, data inline are followed by the corresponding inode with
-    proper alignment, and they could be optional for different data mappings.
-    _currently_ total 4 valid data mappings are supported:
-
-     0  flat file data without data inline (no extent);
-     1  fixed-sized output data compression (with non-compacted indexes);
-     2  flat file data with tail packing data inline (no extent);
-     3  fixed-sized output data compression (with compacted indexes, v5.3+).
-
-    The size of the optional xattrs is indicated by i_xattr_count in inode
-    header. Large xattrs or xattrs shared by many different files can be
-    stored in shared xattrs metadata rather than inlined right after inode.
-
- 2. Shared xattrs metadata space
-    Shared xattrs space is similar to the above inode space, started with
-    a specific block indicated by xattr_blkaddr, organized one by one with
-    proper align.
-
-    Each share xattr can also be directly found by the following formula:
-         xattr offset = xattr_blkaddr * block_size + 4 * xattr_id
-
-                           |-> aligned by  4 bytes
-    + xattr_blkaddr blocks                     |-> aligned with 4 bytes
-     _________________________________________________________________________
-    |  ...   | xattr_entry |  xattr data | ... |  xattr_entry | xattr data  ...
-    |________|_____________|_____________|_____|______________|_______________
-
-Directories
------------
-All directories are now organized in a compact on-disk format. Note that
-each directory block is divided into index and name areas in order to support
-random file lookup, and all directory entries are _strictly_ recorded in
-alphabetical order in order to support improved prefix binary search
-algorithm (could refer to the related source code).
-
-                 ___________________________
-                /                           |
-               /              ______________|________________
-              /              /              | nameoff1       | nameoffN-1
- ____________.______________._______________v________________v__________
-| dirent | dirent | ... | dirent | filename | filename | ... | filename |
-|___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____|
-     \                           ^
-      \                          |                           * could have
-       \                         |                             trailing '\0'
-        \________________________| nameoff0
-
-                             Directory block
-
-Note that apart from the offset of the first filename, nameoff0 also indicates
-the total number of directory entries in this block since it is no need to
-introduce another on-disk field at all.
-
-Compression
------------
-Currently, EROFS supports 4KB fixed-sized output transparent file compression,
-as illustrated below:
-
-         |---- Variant-Length Extent ----|-------- VLE --------|----- VLE -----
-         clusterofs                      clusterofs            clusterofs
-         |                               |                     |   logical data
-_________v_______________________________v_____________________v_______________
-... |    .        |             |        .    |             |  .          | ...
-____|____.________|_____________|________.____|_____________|__.__________|____
-    |-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|
-         size          size          size          size          size
-          .                             .                .                   .
-           .                       .               .                  .
-            .                  .              .                .
-      _______._____________._____________._____________._____________________
-         ... |             |             |             | ... physical data
-      _______|_____________|_____________|_____________|_____________________
-             |-> cluster <-|-> cluster <-|-> cluster <-|
-                  size          size          size
-
-Currently each on-disk physical cluster can contain 4KB (un)compressed data
-at most. For each logical cluster, there is a corresponding on-disk index to
-describe its cluster type, physical cluster address, etc.
-
-See "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details.
-
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 4230f49d2732..03a493b27920 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -61,6 +61,7 @@ Documentation for filesystem implementations.
    dlmfs
    ecryptfs
    efivarfs
+   erofs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 6e29ad2ea34f63f2b959807370672af569861378 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:02 +0100
Subject: docs: filesystems: convert ext2.txt to ReST

- Add a SPDX header;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Use footnoote markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/fde6721f0303259d830391e351dbde48f67f3ec7.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/ext2.rst  | 399 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/ext2.txt  | 388 -----------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 400 insertions(+), 388 deletions(-)
 create mode 100644 Documentation/filesystems/ext2.rst
 delete mode 100644 Documentation/filesystems/ext2.txt

diff --git a/Documentation/filesystems/ext2.rst b/Documentation/filesystems/ext2.rst
new file mode 100644
index 000000000000..d83dbbb162e2
--- /dev/null
+++ b/Documentation/filesystems/ext2.rst
@@ -0,0 +1,399 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+
+The Second Extended Filesystem
+==============================
+
+ext2 was originally released in January 1993.  Written by R\'emy Card,
+Theodore Ts'o and Stephen Tweedie, it was a major rewrite of the
+Extended Filesystem.  It is currently still (April 2001) the predominant
+filesystem in use by Linux.  There are also implementations available
+for NetBSD, FreeBSD, the GNU HURD, Windows 95/98/NT, OS/2 and RISC OS.
+
+Options
+=======
+
+Most defaults are determined by the filesystem superblock, and can be
+set using tune2fs(8). Kernel-determined defaults are indicated by (*).
+
+====================    ===     ================================================
+bsddf			(*)	Makes ``df`` act like BSD.
+minixdf				Makes ``df`` act like Minix.
+
+check=none, nocheck	(*)	Don't do extra checking of bitmaps on mount
+				(check=normal and check=strict options removed)
+
+dax				Use direct access (no page cache).  See
+				Documentation/filesystems/dax.txt.
+
+debug				Extra debugging information is sent to the
+				kernel syslog.  Useful for developers.
+
+errors=continue			Keep going on a filesystem error.
+errors=remount-ro		Remount the filesystem read-only on an error.
+errors=panic			Panic and halt the machine if an error occurs.
+
+grpid, bsdgroups		Give objects the same group ID as their parent.
+nogrpid, sysvgroups		New objects have the group ID of their creator.
+
+nouid32				Use 16-bit UIDs and GIDs.
+
+oldalloc			Enable the old block allocator. Orlov should
+				have better performance, we'd like to get some
+				feedback if it's the contrary for you.
+orlov			(*)	Use the Orlov block allocator.
+				(See http://lwn.net/Articles/14633/ and
+				http://lwn.net/Articles/14446/.)
+
+resuid=n			The user ID which may use the reserved blocks.
+resgid=n			The group ID which may use the reserved blocks.
+
+sb=n				Use alternate superblock at this location.
+
+user_xattr			Enable "user." POSIX Extended Attributes
+				(requires CONFIG_EXT2_FS_XATTR).
+nouser_xattr			Don't support "user." extended attributes.
+
+acl				Enable POSIX Access Control Lists support
+				(requires CONFIG_EXT2_FS_POSIX_ACL).
+noacl				Don't support POSIX ACLs.
+
+nobh				Do not attach buffer_heads to file pagecache.
+
+quota, usrquota			Enable user disk quota support
+				(requires CONFIG_QUOTA).
+
+grpquota			Enable group disk quota support
+				(requires CONFIG_QUOTA).
+====================    ===     ================================================
+
+noquota option ls silently ignored by ext2.
+
+
+Specification
+=============
+
+ext2 shares many properties with traditional Unix filesystems.  It has
+the concepts of blocks, inodes and directories.  It has space in the
+specification for Access Control Lists (ACLs), fragments, undeletion and
+compression though these are not yet implemented (some are available as
+separate patches).  There is also a versioning mechanism to allow new
+features (such as journalling) to be added in a maximally compatible
+manner.
+
+Blocks
+------
+
+The space in the device or file is split up into blocks.  These are
+a fixed size, of 1024, 2048 or 4096 bytes (8192 bytes on Alpha systems),
+which is decided when the filesystem is created.  Smaller blocks mean
+less wasted space per file, but require slightly more accounting overhead,
+and also impose other limits on the size of files and the filesystem.
+
+Block Groups
+------------
+
+Blocks are clustered into block groups in order to reduce fragmentation
+and minimise the amount of head seeking when reading a large amount
+of consecutive data.  Information about each block group is kept in a
+descriptor table stored in the block(s) immediately after the superblock.
+Two blocks near the start of each group are reserved for the block usage
+bitmap and the inode usage bitmap which show which blocks and inodes
+are in use.  Since each bitmap is limited to a single block, this means
+that the maximum size of a block group is 8 times the size of a block.
+
+The block(s) following the bitmaps in each block group are designated
+as the inode table for that block group and the remainder are the data
+blocks.  The block allocation algorithm attempts to allocate data blocks
+in the same block group as the inode which contains them.
+
+The Superblock
+--------------
+
+The superblock contains all the information about the configuration of
+the filing system.  The primary copy of the superblock is stored at an
+offset of 1024 bytes from the start of the device, and it is essential
+to mounting the filesystem.  Since it is so important, backup copies of
+the superblock are stored in block groups throughout the filesystem.
+The first version of ext2 (revision 0) stores a copy at the start of
+every block group, along with backups of the group descriptor block(s).
+Because this can consume a considerable amount of space for large
+filesystems, later revisions can optionally reduce the number of backup
+copies by only putting backups in specific groups (this is the sparse
+superblock feature).  The groups chosen are 0, 1 and powers of 3, 5 and 7.
+
+The information in the superblock contains fields such as the total
+number of inodes and blocks in the filesystem and how many are free,
+how many inodes and blocks are in each block group, when the filesystem
+was mounted (and if it was cleanly unmounted), when it was modified,
+what version of the filesystem it is (see the Revisions section below)
+and which OS created it.
+
+If the filesystem is revision 1 or higher, then there are extra fields,
+such as a volume name, a unique identification number, the inode size,
+and space for optional filesystem features to store configuration info.
+
+All fields in the superblock (as in all other ext2 structures) are stored
+on the disc in little endian format, so a filesystem is portable between
+machines without having to know what machine it was created on.
+
+Inodes
+------
+
+The inode (index node) is a fundamental concept in the ext2 filesystem.
+Each object in the filesystem is represented by an inode.  The inode
+structure contains pointers to the filesystem blocks which contain the
+data held in the object and all of the metadata about an object except
+its name.  The metadata about an object includes the permissions, owner,
+group, flags, size, number of blocks used, access time, change time,
+modification time, deletion time, number of links, fragments, version
+(for NFS) and extended attributes (EAs) and/or Access Control Lists (ACLs).
+
+There are some reserved fields which are currently unused in the inode
+structure and several which are overloaded.  One field is reserved for the
+directory ACL if the inode is a directory and alternately for the top 32
+bits of the file size if the inode is a regular file (allowing file sizes
+larger than 2GB).  The translator field is unused under Linux, but is used
+by the HURD to reference the inode of a program which will be used to
+interpret this object.  Most of the remaining reserved fields have been
+used up for both Linux and the HURD for larger owner and group fields,
+The HURD also has a larger mode field so it uses another of the remaining
+fields to store the extra more bits.
+
+There are pointers to the first 12 blocks which contain the file's data
+in the inode.  There is a pointer to an indirect block (which contains
+pointers to the next set of blocks), a pointer to a doubly-indirect
+block (which contains pointers to indirect blocks) and a pointer to a
+trebly-indirect block (which contains pointers to doubly-indirect blocks).
+
+The flags field contains some ext2-specific flags which aren't catered
+for by the standard chmod flags.  These flags can be listed with lsattr
+and changed with the chattr command, and allow specific filesystem
+behaviour on a per-file basis.  There are flags for secure deletion,
+undeletable, compression, synchronous updates, immutability, append-only,
+dumpable, no-atime, indexed directories, and data-journaling.  Not all
+of these are supported yet.
+
+Directories
+-----------
+
+A directory is a filesystem object and has an inode just like a file.
+It is a specially formatted file containing records which associate
+each name with an inode number.  Later revisions of the filesystem also
+encode the type of the object (file, directory, symlink, device, fifo,
+socket) to avoid the need to check the inode itself for this information
+(support for taking advantage of this feature does not yet exist in
+Glibc 2.2).
+
+The inode allocation code tries to assign inodes which are in the same
+block group as the directory in which they are first created.
+
+The current implementation of ext2 uses a singly-linked list to store
+the filenames in the directory; a pending enhancement uses hashing of the
+filenames to allow lookup without the need to scan the entire directory.
+
+The current implementation never removes empty directory blocks once they
+have been allocated to hold more files.
+
+Special files
+-------------
+
+Symbolic links are also filesystem objects with inodes.  They deserve
+special mention because the data for them is stored within the inode
+itself if the symlink is less than 60 bytes long.  It uses the fields
+which would normally be used to store the pointers to data blocks.
+This is a worthwhile optimisation as it we avoid allocating a full
+block for the symlink, and most symlinks are less than 60 characters long.
+
+Character and block special devices never have data blocks assigned to
+them.  Instead, their device number is stored in the inode, again reusing
+the fields which would be used to point to the data blocks.
+
+Reserved Space
+--------------
+
+In ext2, there is a mechanism for reserving a certain number of blocks
+for a particular user (normally the super-user).  This is intended to
+allow for the system to continue functioning even if non-privileged users
+fill up all the space available to them (this is independent of filesystem
+quotas).  It also keeps the filesystem from filling up entirely which
+helps combat fragmentation.
+
+Filesystem check
+----------------
+
+At boot time, most systems run a consistency check (e2fsck) on their
+filesystems.  The superblock of the ext2 filesystem contains several
+fields which indicate whether fsck should actually run (since checking
+the filesystem at boot can take a long time if it is large).  fsck will
+run if the filesystem was not cleanly unmounted, if the maximum mount
+count has been exceeded or if the maximum time between checks has been
+exceeded.
+
+Feature Compatibility
+---------------------
+
+The compatibility feature mechanism used in ext2 is sophisticated.
+It safely allows features to be added to the filesystem, without
+unnecessarily sacrificing compatibility with older versions of the
+filesystem code.  The feature compatibility mechanism is not supported by
+the original revision 0 (EXT2_GOOD_OLD_REV) of ext2, but was introduced in
+revision 1.  There are three 32-bit fields, one for compatible features
+(COMPAT), one for read-only compatible (RO_COMPAT) features and one for
+incompatible (INCOMPAT) features.
+
+These feature flags have specific meanings for the kernel as follows:
+
+A COMPAT flag indicates that a feature is present in the filesystem,
+but the on-disk format is 100% compatible with older on-disk formats, so
+a kernel which didn't know anything about this feature could read/write
+the filesystem without any chance of corrupting the filesystem (or even
+making it inconsistent).  This is essentially just a flag which says
+"this filesystem has a (hidden) feature" that the kernel or e2fsck may
+want to be aware of (more on e2fsck and feature flags later).  The ext3
+HAS_JOURNAL feature is a COMPAT flag because the ext3 journal is simply
+a regular file with data blocks in it so the kernel does not need to
+take any special notice of it if it doesn't understand ext3 journaling.
+
+An RO_COMPAT flag indicates that the on-disk format is 100% compatible
+with older on-disk formats for reading (i.e. the feature does not change
+the visible on-disk format).  However, an old kernel writing to such a
+filesystem would/could corrupt the filesystem, so this is prevented. The
+most common such feature, SPARSE_SUPER, is an RO_COMPAT feature because
+sparse groups allow file data blocks where superblock/group descriptor
+backups used to live, and ext2_free_blocks() refuses to free these blocks,
+which would leading to inconsistent bitmaps.  An old kernel would also
+get an error if it tried to free a series of blocks which crossed a group
+boundary, but this is a legitimate layout in a SPARSE_SUPER filesystem.
+
+An INCOMPAT flag indicates the on-disk format has changed in some
+way that makes it unreadable by older kernels, or would otherwise
+cause a problem if an old kernel tried to mount it.  FILETYPE is an
+INCOMPAT flag because older kernels would think a filename was longer
+than 256 characters, which would lead to corrupt directory listings.
+The COMPRESSION flag is an obvious INCOMPAT flag - if the kernel
+doesn't understand compression, you would just get garbage back from
+read() instead of it automatically decompressing your data.  The ext3
+RECOVER flag is needed to prevent a kernel which does not understand the
+ext3 journal from mounting the filesystem without replaying the journal.
+
+For e2fsck, it needs to be more strict with the handling of these
+flags than the kernel.  If it doesn't understand ANY of the COMPAT,
+RO_COMPAT, or INCOMPAT flags it will refuse to check the filesystem,
+because it has no way of verifying whether a given feature is valid
+or not.  Allowing e2fsck to succeed on a filesystem with an unknown
+feature is a false sense of security for the user.  Refusing to check
+a filesystem with unknown features is a good incentive for the user to
+update to the latest e2fsck.  This also means that anyone adding feature
+flags to ext2 also needs to update e2fsck to verify these features.
+
+Metadata
+--------
+
+It is frequently claimed that the ext2 implementation of writing
+asynchronous metadata is faster than the ffs synchronous metadata
+scheme but less reliable.  Both methods are equally resolvable by their
+respective fsck programs.
+
+If you're exceptionally paranoid, there are 3 ways of making metadata
+writes synchronous on ext2:
+
+- per-file if you have the program source: use the O_SYNC flag to open()
+- per-file if you don't have the source: use "chattr +S" on the file
+- per-filesystem: add the "sync" option to mount (or in /etc/fstab)
+
+the first and last are not ext2 specific but do force the metadata to
+be written synchronously.  See also Journaling below.
+
+Limitations
+-----------
+
+There are various limits imposed by the on-disk layout of ext2.  Other
+limits are imposed by the current implementation of the kernel code.
+Many of the limits are determined at the time the filesystem is first
+created, and depend upon the block size chosen.  The ratio of inodes to
+data blocks is fixed at filesystem creation time, so the only way to
+increase the number of inodes is to increase the size of the filesystem.
+No tools currently exist which can change the ratio of inodes to blocks.
+
+Most of these limits could be overcome with slight changes in the on-disk
+format and using a compatibility flag to signal the format change (at
+the expense of some compatibility).
+
+=====================  =======    =======    =======   ========
+Filesystem block size      1kB        2kB        4kB        8kB
+=====================  =======    =======    =======   ========
+File size limit           16GB      256GB     2048GB     2048GB
+Filesystem size limit   2047GB     8192GB    16384GB    32768GB
+=====================  =======    =======    =======   ========
+
+There is a 2.4 kernel limit of 2048GB for a single block device, so no
+filesystem larger than that can be created at this time.  There is also
+an upper limit on the block size imposed by the page size of the kernel,
+so 8kB blocks are only allowed on Alpha systems (and other architectures
+which support larger pages).
+
+There is an upper limit of 32000 subdirectories in a single directory.
+
+There is a "soft" upper limit of about 10-15k files in a single directory
+with the current linear linked-list directory implementation.  This limit
+stems from performance problems when creating and deleting (and also
+finding) files in such large directories.  Using a hashed directory index
+(under development) allows 100k-1M+ files in a single directory without
+performance problems (although RAM size becomes an issue at this point).
+
+The (meaningless) absolute upper limit of files in a single directory
+(imposed by the file size, the realistic limit is obviously much less)
+is over 130 trillion files.  It would be higher except there are not
+enough 4-character names to make up unique directory entries, so they
+have to be 8 character filenames, even then we are fairly close to
+running out of unique filenames.
+
+Journaling
+----------
+
+A journaling extension to the ext2 code has been developed by Stephen
+Tweedie.  It avoids the risks of metadata corruption and the need to
+wait for e2fsck to complete after a crash, without requiring a change
+to the on-disk ext2 layout.  In a nutshell, the journal is a regular
+file which stores whole metadata (and optionally data) blocks that have
+been modified, prior to writing them into the filesystem.  This means
+it is possible to add a journal to an existing ext2 filesystem without
+the need for data conversion.
+
+When changes to the filesystem (e.g. a file is renamed) they are stored in
+a transaction in the journal and can either be complete or incomplete at
+the time of a crash.  If a transaction is complete at the time of a crash
+(or in the normal case where the system does not crash), then any blocks
+in that transaction are guaranteed to represent a valid filesystem state,
+and are copied into the filesystem.  If a transaction is incomplete at
+the time of the crash, then there is no guarantee of consistency for
+the blocks in that transaction so they are discarded (which means any
+filesystem changes they represent are also lost).
+Check Documentation/filesystems/ext4/ if you want to read more about
+ext4 and journaling.
+
+References
+==========
+
+=======================	===============================================
+The kernel source	file:/usr/src/linux/fs/ext2/
+e2fsprogs (e2fsck)	http://e2fsprogs.sourceforge.net/
+Design & Implementation	http://e2fsprogs.sourceforge.net/ext2intro.html
+Journaling (ext3)	ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/
+Filesystem Resizing	http://ext2resize.sourceforge.net/
+Compression [1]_	http://e2compr.sourceforge.net/
+=======================	===============================================
+
+Implementations for:
+
+=======================	===========================================================
+Windows 95/98/NT/2000	http://www.chrysocome.net/explore2fs
+Windows 95 [1]_		http://www.yipton.net/content.html#FSDEXT2
+DOS client [1]_		ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
+OS/2 [2]_		ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
+RISC OS client		http://www.esw-heim.tu-clausthal.de/~marco/smorbrod/IscaFS/
+=======================	===========================================================
+
+.. [1] no longer actively developed/supported (as of Apr 2001)
+.. [2] no longer actively developed/supported (as of Mar 2009)
diff --git a/Documentation/filesystems/ext2.txt b/Documentation/filesystems/ext2.txt
deleted file mode 100644
index 94c2cf0292f5..000000000000
--- a/Documentation/filesystems/ext2.txt
+++ /dev/null
@@ -1,388 +0,0 @@
-
-The Second Extended Filesystem
-==============================
-
-ext2 was originally released in January 1993.  Written by R\'emy Card,
-Theodore Ts'o and Stephen Tweedie, it was a major rewrite of the
-Extended Filesystem.  It is currently still (April 2001) the predominant
-filesystem in use by Linux.  There are also implementations available
-for NetBSD, FreeBSD, the GNU HURD, Windows 95/98/NT, OS/2 and RISC OS.
-
-Options
-=======
-
-Most defaults are determined by the filesystem superblock, and can be
-set using tune2fs(8). Kernel-determined defaults are indicated by (*).
-
-bsddf			(*)	Makes `df' act like BSD.
-minixdf				Makes `df' act like Minix.
-
-check=none, nocheck	(*)	Don't do extra checking of bitmaps on mount
-				(check=normal and check=strict options removed)
-
-dax				Use direct access (no page cache).  See
-				Documentation/filesystems/dax.txt.
-
-debug				Extra debugging information is sent to the
-				kernel syslog.  Useful for developers.
-
-errors=continue			Keep going on a filesystem error.
-errors=remount-ro		Remount the filesystem read-only on an error.
-errors=panic			Panic and halt the machine if an error occurs.
-
-grpid, bsdgroups		Give objects the same group ID as their parent.
-nogrpid, sysvgroups		New objects have the group ID of their creator.
-
-nouid32				Use 16-bit UIDs and GIDs.
-
-oldalloc			Enable the old block allocator. Orlov should
-				have better performance, we'd like to get some
-				feedback if it's the contrary for you.
-orlov			(*)	Use the Orlov block allocator.
-				(See http://lwn.net/Articles/14633/ and
-				http://lwn.net/Articles/14446/.)
-
-resuid=n			The user ID which may use the reserved blocks.
-resgid=n			The group ID which may use the reserved blocks.
-
-sb=n				Use alternate superblock at this location.
-
-user_xattr			Enable "user." POSIX Extended Attributes
-				(requires CONFIG_EXT2_FS_XATTR).
-nouser_xattr			Don't support "user." extended attributes.
-
-acl				Enable POSIX Access Control Lists support
-				(requires CONFIG_EXT2_FS_POSIX_ACL).
-noacl				Don't support POSIX ACLs.
-
-nobh				Do not attach buffer_heads to file pagecache.
-
-quota, usrquota			Enable user disk quota support
-				(requires CONFIG_QUOTA).
-
-grpquota			Enable group disk quota support
-				(requires CONFIG_QUOTA).
-
-noquota option ls silently ignored by ext2.
-
-
-Specification
-=============
-
-ext2 shares many properties with traditional Unix filesystems.  It has
-the concepts of blocks, inodes and directories.  It has space in the
-specification for Access Control Lists (ACLs), fragments, undeletion and
-compression though these are not yet implemented (some are available as
-separate patches).  There is also a versioning mechanism to allow new
-features (such as journalling) to be added in a maximally compatible
-manner.
-
-Blocks
-------
-
-The space in the device or file is split up into blocks.  These are
-a fixed size, of 1024, 2048 or 4096 bytes (8192 bytes on Alpha systems),
-which is decided when the filesystem is created.  Smaller blocks mean
-less wasted space per file, but require slightly more accounting overhead,
-and also impose other limits on the size of files and the filesystem.
-
-Block Groups
-------------
-
-Blocks are clustered into block groups in order to reduce fragmentation
-and minimise the amount of head seeking when reading a large amount
-of consecutive data.  Information about each block group is kept in a
-descriptor table stored in the block(s) immediately after the superblock.
-Two blocks near the start of each group are reserved for the block usage
-bitmap and the inode usage bitmap which show which blocks and inodes
-are in use.  Since each bitmap is limited to a single block, this means
-that the maximum size of a block group is 8 times the size of a block.
-
-The block(s) following the bitmaps in each block group are designated
-as the inode table for that block group and the remainder are the data
-blocks.  The block allocation algorithm attempts to allocate data blocks
-in the same block group as the inode which contains them.
-
-The Superblock
---------------
-
-The superblock contains all the information about the configuration of
-the filing system.  The primary copy of the superblock is stored at an
-offset of 1024 bytes from the start of the device, and it is essential
-to mounting the filesystem.  Since it is so important, backup copies of
-the superblock are stored in block groups throughout the filesystem.
-The first version of ext2 (revision 0) stores a copy at the start of
-every block group, along with backups of the group descriptor block(s).
-Because this can consume a considerable amount of space for large
-filesystems, later revisions can optionally reduce the number of backup
-copies by only putting backups in specific groups (this is the sparse
-superblock feature).  The groups chosen are 0, 1 and powers of 3, 5 and 7.
-
-The information in the superblock contains fields such as the total
-number of inodes and blocks in the filesystem and how many are free,
-how many inodes and blocks are in each block group, when the filesystem
-was mounted (and if it was cleanly unmounted), when it was modified,
-what version of the filesystem it is (see the Revisions section below)
-and which OS created it.
-
-If the filesystem is revision 1 or higher, then there are extra fields,
-such as a volume name, a unique identification number, the inode size,
-and space for optional filesystem features to store configuration info.
-
-All fields in the superblock (as in all other ext2 structures) are stored
-on the disc in little endian format, so a filesystem is portable between
-machines without having to know what machine it was created on.
-
-Inodes
-------
-
-The inode (index node) is a fundamental concept in the ext2 filesystem.
-Each object in the filesystem is represented by an inode.  The inode
-structure contains pointers to the filesystem blocks which contain the
-data held in the object and all of the metadata about an object except
-its name.  The metadata about an object includes the permissions, owner,
-group, flags, size, number of blocks used, access time, change time,
-modification time, deletion time, number of links, fragments, version
-(for NFS) and extended attributes (EAs) and/or Access Control Lists (ACLs).
-
-There are some reserved fields which are currently unused in the inode
-structure and several which are overloaded.  One field is reserved for the
-directory ACL if the inode is a directory and alternately for the top 32
-bits of the file size if the inode is a regular file (allowing file sizes
-larger than 2GB).  The translator field is unused under Linux, but is used
-by the HURD to reference the inode of a program which will be used to
-interpret this object.  Most of the remaining reserved fields have been
-used up for both Linux and the HURD for larger owner and group fields,
-The HURD also has a larger mode field so it uses another of the remaining
-fields to store the extra more bits.
-
-There are pointers to the first 12 blocks which contain the file's data
-in the inode.  There is a pointer to an indirect block (which contains
-pointers to the next set of blocks), a pointer to a doubly-indirect
-block (which contains pointers to indirect blocks) and a pointer to a
-trebly-indirect block (which contains pointers to doubly-indirect blocks).
-
-The flags field contains some ext2-specific flags which aren't catered
-for by the standard chmod flags.  These flags can be listed with lsattr
-and changed with the chattr command, and allow specific filesystem
-behaviour on a per-file basis.  There are flags for secure deletion,
-undeletable, compression, synchronous updates, immutability, append-only,
-dumpable, no-atime, indexed directories, and data-journaling.  Not all
-of these are supported yet.
-
-Directories
------------
-
-A directory is a filesystem object and has an inode just like a file.
-It is a specially formatted file containing records which associate
-each name with an inode number.  Later revisions of the filesystem also
-encode the type of the object (file, directory, symlink, device, fifo,
-socket) to avoid the need to check the inode itself for this information
-(support for taking advantage of this feature does not yet exist in
-Glibc 2.2).
-
-The inode allocation code tries to assign inodes which are in the same
-block group as the directory in which they are first created.
-
-The current implementation of ext2 uses a singly-linked list to store
-the filenames in the directory; a pending enhancement uses hashing of the
-filenames to allow lookup without the need to scan the entire directory.
-
-The current implementation never removes empty directory blocks once they
-have been allocated to hold more files.
-
-Special files
--------------
-
-Symbolic links are also filesystem objects with inodes.  They deserve
-special mention because the data for them is stored within the inode
-itself if the symlink is less than 60 bytes long.  It uses the fields
-which would normally be used to store the pointers to data blocks.
-This is a worthwhile optimisation as it we avoid allocating a full
-block for the symlink, and most symlinks are less than 60 characters long.
-
-Character and block special devices never have data blocks assigned to
-them.  Instead, their device number is stored in the inode, again reusing
-the fields which would be used to point to the data blocks.
-
-Reserved Space
---------------
-
-In ext2, there is a mechanism for reserving a certain number of blocks
-for a particular user (normally the super-user).  This is intended to
-allow for the system to continue functioning even if non-privileged users
-fill up all the space available to them (this is independent of filesystem
-quotas).  It also keeps the filesystem from filling up entirely which
-helps combat fragmentation.
-
-Filesystem check
-----------------
-
-At boot time, most systems run a consistency check (e2fsck) on their
-filesystems.  The superblock of the ext2 filesystem contains several
-fields which indicate whether fsck should actually run (since checking
-the filesystem at boot can take a long time if it is large).  fsck will
-run if the filesystem was not cleanly unmounted, if the maximum mount
-count has been exceeded or if the maximum time between checks has been
-exceeded.
-
-Feature Compatibility
----------------------
-
-The compatibility feature mechanism used in ext2 is sophisticated.
-It safely allows features to be added to the filesystem, without
-unnecessarily sacrificing compatibility with older versions of the
-filesystem code.  The feature compatibility mechanism is not supported by
-the original revision 0 (EXT2_GOOD_OLD_REV) of ext2, but was introduced in
-revision 1.  There are three 32-bit fields, one for compatible features
-(COMPAT), one for read-only compatible (RO_COMPAT) features and one for
-incompatible (INCOMPAT) features.
-
-These feature flags have specific meanings for the kernel as follows:
-
-A COMPAT flag indicates that a feature is present in the filesystem,
-but the on-disk format is 100% compatible with older on-disk formats, so
-a kernel which didn't know anything about this feature could read/write
-the filesystem without any chance of corrupting the filesystem (or even
-making it inconsistent).  This is essentially just a flag which says
-"this filesystem has a (hidden) feature" that the kernel or e2fsck may
-want to be aware of (more on e2fsck and feature flags later).  The ext3
-HAS_JOURNAL feature is a COMPAT flag because the ext3 journal is simply
-a regular file with data blocks in it so the kernel does not need to
-take any special notice of it if it doesn't understand ext3 journaling.
-
-An RO_COMPAT flag indicates that the on-disk format is 100% compatible
-with older on-disk formats for reading (i.e. the feature does not change
-the visible on-disk format).  However, an old kernel writing to such a
-filesystem would/could corrupt the filesystem, so this is prevented. The
-most common such feature, SPARSE_SUPER, is an RO_COMPAT feature because
-sparse groups allow file data blocks where superblock/group descriptor
-backups used to live, and ext2_free_blocks() refuses to free these blocks,
-which would leading to inconsistent bitmaps.  An old kernel would also
-get an error if it tried to free a series of blocks which crossed a group
-boundary, but this is a legitimate layout in a SPARSE_SUPER filesystem.
-
-An INCOMPAT flag indicates the on-disk format has changed in some
-way that makes it unreadable by older kernels, or would otherwise
-cause a problem if an old kernel tried to mount it.  FILETYPE is an
-INCOMPAT flag because older kernels would think a filename was longer
-than 256 characters, which would lead to corrupt directory listings.
-The COMPRESSION flag is an obvious INCOMPAT flag - if the kernel
-doesn't understand compression, you would just get garbage back from
-read() instead of it automatically decompressing your data.  The ext3
-RECOVER flag is needed to prevent a kernel which does not understand the
-ext3 journal from mounting the filesystem without replaying the journal.
-
-For e2fsck, it needs to be more strict with the handling of these
-flags than the kernel.  If it doesn't understand ANY of the COMPAT,
-RO_COMPAT, or INCOMPAT flags it will refuse to check the filesystem,
-because it has no way of verifying whether a given feature is valid
-or not.  Allowing e2fsck to succeed on a filesystem with an unknown
-feature is a false sense of security for the user.  Refusing to check
-a filesystem with unknown features is a good incentive for the user to
-update to the latest e2fsck.  This also means that anyone adding feature
-flags to ext2 also needs to update e2fsck to verify these features.
-
-Metadata
---------
-
-It is frequently claimed that the ext2 implementation of writing
-asynchronous metadata is faster than the ffs synchronous metadata
-scheme but less reliable.  Both methods are equally resolvable by their
-respective fsck programs.
-
-If you're exceptionally paranoid, there are 3 ways of making metadata
-writes synchronous on ext2:
-
-per-file if you have the program source: use the O_SYNC flag to open()
-per-file if you don't have the source: use "chattr +S" on the file
-per-filesystem: add the "sync" option to mount (or in /etc/fstab)
-
-the first and last are not ext2 specific but do force the metadata to
-be written synchronously.  See also Journaling below.
-
-Limitations
------------
-
-There are various limits imposed by the on-disk layout of ext2.  Other
-limits are imposed by the current implementation of the kernel code.
-Many of the limits are determined at the time the filesystem is first
-created, and depend upon the block size chosen.  The ratio of inodes to
-data blocks is fixed at filesystem creation time, so the only way to
-increase the number of inodes is to increase the size of the filesystem.
-No tools currently exist which can change the ratio of inodes to blocks.
-
-Most of these limits could be overcome with slight changes in the on-disk
-format and using a compatibility flag to signal the format change (at
-the expense of some compatibility).
-
-Filesystem block size:     1kB        2kB        4kB        8kB
-
-File size limit:          16GB      256GB     2048GB     2048GB
-Filesystem size limit:  2047GB     8192GB    16384GB    32768GB
-
-There is a 2.4 kernel limit of 2048GB for a single block device, so no
-filesystem larger than that can be created at this time.  There is also
-an upper limit on the block size imposed by the page size of the kernel,
-so 8kB blocks are only allowed on Alpha systems (and other architectures
-which support larger pages).
-
-There is an upper limit of 32000 subdirectories in a single directory.
-
-There is a "soft" upper limit of about 10-15k files in a single directory
-with the current linear linked-list directory implementation.  This limit
-stems from performance problems when creating and deleting (and also
-finding) files in such large directories.  Using a hashed directory index
-(under development) allows 100k-1M+ files in a single directory without
-performance problems (although RAM size becomes an issue at this point).
-
-The (meaningless) absolute upper limit of files in a single directory
-(imposed by the file size, the realistic limit is obviously much less)
-is over 130 trillion files.  It would be higher except there are not
-enough 4-character names to make up unique directory entries, so they
-have to be 8 character filenames, even then we are fairly close to
-running out of unique filenames.
-
-Journaling
-----------
-
-A journaling extension to the ext2 code has been developed by Stephen
-Tweedie.  It avoids the risks of metadata corruption and the need to
-wait for e2fsck to complete after a crash, without requiring a change
-to the on-disk ext2 layout.  In a nutshell, the journal is a regular
-file which stores whole metadata (and optionally data) blocks that have
-been modified, prior to writing them into the filesystem.  This means
-it is possible to add a journal to an existing ext2 filesystem without
-the need for data conversion.
-
-When changes to the filesystem (e.g. a file is renamed) they are stored in
-a transaction in the journal and can either be complete or incomplete at
-the time of a crash.  If a transaction is complete at the time of a crash
-(or in the normal case where the system does not crash), then any blocks
-in that transaction are guaranteed to represent a valid filesystem state,
-and are copied into the filesystem.  If a transaction is incomplete at
-the time of the crash, then there is no guarantee of consistency for
-the blocks in that transaction so they are discarded (which means any
-filesystem changes they represent are also lost).
-Check Documentation/filesystems/ext4/ if you want to read more about
-ext4 and journaling.
-
-References
-==========
-
-The kernel source	file:/usr/src/linux/fs/ext2/
-e2fsprogs (e2fsck)	http://e2fsprogs.sourceforge.net/
-Design & Implementation	http://e2fsprogs.sourceforge.net/ext2intro.html
-Journaling (ext3)	ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/
-Filesystem Resizing	http://ext2resize.sourceforge.net/
-Compression (*)		http://e2compr.sourceforge.net/
-
-Implementations for:
-Windows 95/98/NT/2000	http://www.chrysocome.net/explore2fs
-Windows 95 (*)		http://www.yipton.net/content.html#FSDEXT2
-DOS client (*)		ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
-OS/2 (+)		ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
-RISC OS client		http://www.esw-heim.tu-clausthal.de/~marco/smorbrod/IscaFS/
-
-(*) no longer actively developed/supported (as of Apr 2001)
-(+) no longer actively developed/supported (as of Mar 2009)
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 03a493b27920..102b3b65486a 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -62,6 +62,7 @@ Documentation for filesystem implementations.
    ecryptfs
    efivarfs
    erofs
+   ext2
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 7dc62406320c4103bbdeeeecd0a7ef03e3e58009 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:03 +0100
Subject: docs: filesystems: convert ext3.txt to ReST

Nothing really required here. Just renaming would be enough.

Yet, while here, lets add a SPDX header and adjust document title
to met the same standard we're using on most docs.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/26960235e3e7c972bd543f5dd59f1ef4f3a877c6.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/ext3.rst  | 14 ++++++++++++++
 Documentation/filesystems/ext3.txt  | 12 ------------
 Documentation/filesystems/index.rst |  1 +
 3 files changed, 15 insertions(+), 12 deletions(-)
 create mode 100644 Documentation/filesystems/ext3.rst
 delete mode 100644 Documentation/filesystems/ext3.txt

diff --git a/Documentation/filesystems/ext3.rst b/Documentation/filesystems/ext3.rst
new file mode 100644
index 000000000000..c06cec3a8fdc
--- /dev/null
+++ b/Documentation/filesystems/ext3.rst
@@ -0,0 +1,14 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+Ext3 Filesystem
+===============
+
+Ext3 was originally released in September 1999. Written by Stephen Tweedie
+for the 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger,
+Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie.
+
+Ext3 is the ext2 filesystem enhanced with journalling capabilities. The
+filesystem is a subset of ext4 filesystem so use ext4 driver for accessing
+ext3 filesystems.
+
diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
deleted file mode 100644
index 58758fbef9e0..000000000000
--- a/Documentation/filesystems/ext3.txt
+++ /dev/null
@@ -1,12 +0,0 @@
-
-Ext3 Filesystem
-===============
-
-Ext3 was originally released in September 1999. Written by Stephen Tweedie
-for the 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger,
-Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie.
-
-Ext3 is the ext2 filesystem enhanced with journalling capabilities. The
-filesystem is a subset of ext4 filesystem so use ext4 driver for accessing
-ext3 filesystems.
-
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 102b3b65486a..aa2c3d1de3de 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -63,6 +63,7 @@ Documentation for filesystem implementations.
    efivarfs
    erofs
    ext2
+   ext3
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 89272ca1102e000f7dbca724b7b106e688199a5d Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:04 +0100
Subject: docs: filesystems: convert f2fs.txt to ReST

- Add a SPDX header;
- Adjust document and section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/8dd156320b0c015dec6d3f848d03ea057042a15b.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/f2fs.rst  | 762 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/f2fs.txt  | 730 ----------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 763 insertions(+), 730 deletions(-)
 create mode 100644 Documentation/filesystems/f2fs.rst
 delete mode 100644 Documentation/filesystems/f2fs.txt

diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst
new file mode 100644
index 000000000000..d681203728d7
--- /dev/null
+++ b/Documentation/filesystems/f2fs.rst
@@ -0,0 +1,762 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==========================================
+WHAT IS Flash-Friendly File System (F2FS)?
+==========================================
+
+NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
+been equipped on a variety systems ranging from mobile to server systems. Since
+they are known to have different characteristics from the conventional rotating
+disks, a file system, an upper layer to the storage device, should adapt to the
+changes from the sketch in the design level.
+
+F2FS is a file system exploiting NAND flash memory-based storage devices, which
+is based on Log-structured File System (LFS). The design has been focused on
+addressing the fundamental issues in LFS, which are snowball effect of wandering
+tree and high cleaning overhead.
+
+Since a NAND flash memory-based storage device shows different characteristic
+according to its internal geometry or flash memory management scheme, namely FTL,
+F2FS and its tools support various parameters not only for configuring on-disk
+layout, but also for selecting allocation and cleaning algorithms.
+
+The following git tree provides the file system formatting tool (mkfs.f2fs),
+a consistency checking tool (fsck.f2fs), and a debugging tool (dump.f2fs).
+
+- git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git
+
+For reporting bugs and sending patches, please use the following mailing list:
+
+- linux-f2fs-devel@lists.sourceforge.net
+
+Background and Design issues
+============================
+
+Log-structured File System (LFS)
+--------------------------------
+"A log-structured file system writes all modifications to disk sequentially in
+a log-like structure, thereby speeding up  both file writing and crash recovery.
+The log is the only structure on disk; it contains indexing information so that
+files can be read back from the log efficiently. In order to maintain large free
+areas on disk for fast writing, we divide  the log into segments and use a
+segment cleaner to compress the live information from heavily fragmented
+segments." from Rosenblum, M. and Ousterhout, J. K., 1992, "The design and
+implementation of a log-structured file system", ACM Trans. Computer Systems
+10, 1, 26–52.
+
+Wandering Tree Problem
+----------------------
+In LFS, when a file data is updated and written to the end of log, its direct
+pointer block is updated due to the changed location. Then the indirect pointer
+block is also updated due to the direct pointer block update. In this manner,
+the upper index structures such as inode, inode map, and checkpoint block are
+also updated recursively. This problem is called as wandering tree problem [1],
+and in order to enhance the performance, it should eliminate or relax the update
+propagation as much as possible.
+
+[1] Bityutskiy, A. 2005. JFFS3 design issues. http://www.linux-mtd.infradead.org/
+
+Cleaning Overhead
+-----------------
+Since LFS is based on out-of-place writes, it produces so many obsolete blocks
+scattered across the whole storage. In order to serve new empty log space, it
+needs to reclaim these obsolete blocks seamlessly to users. This job is called
+as a cleaning process.
+
+The process consists of three operations as follows.
+
+1. A victim segment is selected through referencing segment usage table.
+2. It loads parent index structures of all the data in the victim identified by
+   segment summary blocks.
+3. It checks the cross-reference between the data and its parent index structure.
+4. It moves valid data selectively.
+
+This cleaning job may cause unexpected long delays, so the most important goal
+is to hide the latencies to users. And also definitely, it should reduce the
+amount of valid data to be moved, and move them quickly as well.
+
+Key Features
+============
+
+Flash Awareness
+---------------
+- Enlarge the random write area for better performance, but provide the high
+  spatial locality
+- Align FS data structures to the operational units in FTL as best efforts
+
+Wandering Tree Problem
+----------------------
+- Use a term, “node”, that represents inodes as well as various pointer blocks
+- Introduce Node Address Table (NAT) containing the locations of all the “node”
+  blocks; this will cut off the update propagation.
+
+Cleaning Overhead
+-----------------
+- Support a background cleaning process
+- Support greedy and cost-benefit algorithms for victim selection policies
+- Support multi-head logs for static/dynamic hot and cold data separation
+- Introduce adaptive logging for efficient block allocation
+
+Mount Options
+=============
+
+
+====================== ============================================================
+background_gc=%s       Turn on/off cleaning operations, namely garbage
+                       collection, triggered in background when I/O subsystem is
+                       idle. If background_gc=on, it will turn on the garbage
+                       collection and if background_gc=off, garbage collection
+                       will be turned off. If background_gc=sync, it will turn
+                       on synchronous garbage collection running in background.
+                       Default value for this option is on. So garbage
+                       collection is on by default.
+disable_roll_forward   Disable the roll-forward recovery routine
+norecovery             Disable the roll-forward recovery routine, mounted read-
+                       only (i.e., -o ro,disable_roll_forward)
+discard/nodiscard      Enable/disable real-time discard in f2fs, if discard is
+                       enabled, f2fs will issue discard/TRIM commands when a
+		       segment is cleaned.
+no_heap                Disable heap-style segment allocation which finds free
+                       segments for data from the beginning of main area, while
+		       for node from the end of main area.
+nouser_xattr           Disable Extended User Attributes. Note: xattr is enabled
+                       by default if CONFIG_F2FS_FS_XATTR is selected.
+noacl                  Disable POSIX Access Control List. Note: acl is enabled
+                       by default if CONFIG_F2FS_FS_POSIX_ACL is selected.
+active_logs=%u         Support configuring the number of active logs. In the
+                       current design, f2fs supports only 2, 4, and 6 logs.
+                       Default number is 6.
+disable_ext_identify   Disable the extension list configured by mkfs, so f2fs
+                       does not aware of cold files such as media files.
+inline_xattr           Enable the inline xattrs feature.
+noinline_xattr         Disable the inline xattrs feature.
+inline_xattr_size=%u   Support configuring inline xattr size, it depends on
+		       flexible inline xattr feature.
+inline_data            Enable the inline data feature: New created small(<~3.4k)
+                       files can be written into inode block.
+inline_dentry          Enable the inline dir feature: data in new created
+                       directory entries can be written into inode block. The
+                       space of inode block which is used to store inline
+                       dentries is limited to ~3.4k.
+noinline_dentry        Disable the inline dentry feature.
+flush_merge	       Merge concurrent cache_flush commands as much as possible
+                       to eliminate redundant command issues. If the underlying
+		       device handles the cache_flush command relatively slowly,
+		       recommend to enable this option.
+nobarrier              This option can be used if underlying storage guarantees
+                       its cached data should be written to the novolatile area.
+		       If this option is set, no cache_flush commands are issued
+		       but f2fs still guarantees the write ordering of all the
+		       data writes.
+fastboot               This option is used when a system wants to reduce mount
+                       time as much as possible, even though normal performance
+		       can be sacrificed.
+extent_cache           Enable an extent cache based on rb-tree, it can cache
+                       as many as extent which map between contiguous logical
+                       address and physical address per inode, resulting in
+                       increasing the cache hit ratio. Set by default.
+noextent_cache         Disable an extent cache based on rb-tree explicitly, see
+                       the above extent_cache mount option.
+noinline_data          Disable the inline data feature, inline data feature is
+                       enabled by default.
+data_flush             Enable data flushing before checkpoint in order to
+                       persist data of regular and symlink.
+reserve_root=%d        Support configuring reserved space which is used for
+                       allocation from a privileged user with specified uid or
+                       gid, unit: 4KB, the default limit is 0.2% of user blocks.
+resuid=%d              The user ID which may use the reserved blocks.
+resgid=%d              The group ID which may use the reserved blocks.
+fault_injection=%d     Enable fault injection in all supported types with
+                       specified injection rate.
+fault_type=%d          Support configuring fault injection type, should be
+                       enabled with fault_injection option, fault type value
+                       is shown below, it supports single or combined type.
+
+                       ===================	===========
+                       Type_Name		Type_Value
+                       ===================	===========
+                       FAULT_KMALLOC		0x000000001
+                       FAULT_KVMALLOC		0x000000002
+                       FAULT_PAGE_ALLOC		0x000000004
+                       FAULT_PAGE_GET		0x000000008
+                       FAULT_ALLOC_BIO		0x000000010
+                       FAULT_ALLOC_NID		0x000000020
+                       FAULT_ORPHAN		0x000000040
+                       FAULT_BLOCK		0x000000080
+                       FAULT_DIR_DEPTH		0x000000100
+                       FAULT_EVICT_INODE	0x000000200
+                       FAULT_TRUNCATE		0x000000400
+                       FAULT_READ_IO		0x000000800
+                       FAULT_CHECKPOINT		0x000001000
+                       FAULT_DISCARD		0x000002000
+                       FAULT_WRITE_IO		0x000004000
+                       ===================	===========
+mode=%s                Control block allocation mode which supports "adaptive"
+                       and "lfs". In "lfs" mode, there should be no random
+                       writes towards main area.
+io_bits=%u             Set the bit size of write IO requests. It should be set
+                       with "mode=lfs".
+usrquota               Enable plain user disk quota accounting.
+grpquota               Enable plain group disk quota accounting.
+prjquota               Enable plain project quota accounting.
+usrjquota=<file>       Appoint specified file and type during mount, so that quota
+grpjquota=<file>       information can be properly updated during recovery flow,
+prjjquota=<file>       <quota file>: must be in root directory;
+jqfmt=<quota type>     <quota type>: [vfsold,vfsv0,vfsv1].
+offusrjquota           Turn off user journelled quota.
+offgrpjquota           Turn off group journelled quota.
+offprjjquota           Turn off project journelled quota.
+quota                  Enable plain user disk quota accounting.
+noquota                Disable all plain disk quota option.
+whint_mode=%s          Control which write hints are passed down to block
+                       layer. This supports "off", "user-based", and
+                       "fs-based".  In "off" mode (default), f2fs does not pass
+                       down hints. In "user-based" mode, f2fs tries to pass
+                       down hints given by users. And in "fs-based" mode, f2fs
+                       passes down hints with its policy.
+alloc_mode=%s          Adjust block allocation policy, which supports "reuse"
+                       and "default".
+fsync_mode=%s          Control the policy of fsync. Currently supports "posix",
+                       "strict", and "nobarrier". In "posix" mode, which is
+                       default, fsync will follow POSIX semantics and does a
+                       light operation to improve the filesystem performance.
+                       In "strict" mode, fsync will be heavy and behaves in line
+                       with xfs, ext4 and btrfs, where xfstest generic/342 will
+                       pass, but the performance will regress. "nobarrier" is
+                       based on "posix", but doesn't issue flush command for
+                       non-atomic files likewise "nobarrier" mount option.
+test_dummy_encryption  Enable dummy encryption, which provides a fake fscrypt
+                       context. The fake fscrypt context is used by xfstests.
+checkpoint=%s[:%u[%]]  Set to "disable" to turn off checkpointing. Set to "enable"
+                       to reenable checkpointing. Is enabled by default. While
+                       disabled, any unmounting or unexpected shutdowns will cause
+                       the filesystem contents to appear as they did when the
+                       filesystem was mounted with that option.
+                       While mounting with checkpoint=disabled, the filesystem must
+                       run garbage collection to ensure that all available space can
+                       be used. If this takes too much time, the mount may return
+                       EAGAIN. You may optionally add a value to indicate how much
+                       of the disk you would be willing to temporarily give up to
+                       avoid additional garbage collection. This can be given as a
+                       number of blocks, or as a percent. For instance, mounting
+                       with checkpoint=disable:100% would always succeed, but it may
+                       hide up to all remaining free space. The actual space that
+                       would be unusable can be viewed at /sys/fs/f2fs/<disk>/unusable
+                       This space is reclaimed once checkpoint=enable.
+compress_algorithm=%s  Control compress algorithm, currently f2fs supports "lzo"
+                       and "lz4" algorithm.
+compress_log_size=%u   Support configuring compress cluster size, the size will
+                       be 4KB * (1 << %u), 16KB is minimum size, also it's
+                       default size.
+compress_extension=%s  Support adding specified extension, so that f2fs can enable
+                       compression on those corresponding files, e.g. if all files
+                       with '.ext' has high compression rate, we can set the '.ext'
+                       on compression extension list and enable compression on
+                       these file by default rather than to enable it via ioctl.
+                       For other files, we can still enable compression via ioctl.
+====================== ============================================================
+
+Debugfs Entries
+===============
+
+/sys/kernel/debug/f2fs/ contains information about all the partitions mounted as
+f2fs. Each file shows the whole f2fs information.
+
+/sys/kernel/debug/f2fs/status includes:
+
+ - major file system information managed by f2fs currently
+ - average SIT information about whole segments
+ - current memory footprint consumed by f2fs.
+
+Sysfs Entries
+=============
+
+Information about mounted f2fs file systems can be found in
+/sys/fs/f2fs.  Each mounted filesystem will have a directory in
+/sys/fs/f2fs based on its device name (i.e., /sys/fs/f2fs/sda).
+The files in each per-device directory are shown in table below.
+
+Files in /sys/fs/f2fs/<devname>
+(see also Documentation/ABI/testing/sysfs-fs-f2fs)
+
+Usage
+=====
+
+1. Download userland tools and compile them.
+
+2. Skip, if f2fs was compiled statically inside kernel.
+   Otherwise, insert the f2fs.ko module::
+
+	# insmod f2fs.ko
+
+3. Create a directory trying to mount::
+
+	# mkdir /mnt/f2fs
+
+4. Format the block device, and then mount as f2fs::
+
+	# mkfs.f2fs -l label /dev/block_device
+	# mount -t f2fs /dev/block_device /mnt/f2fs
+
+mkfs.f2fs
+---------
+The mkfs.f2fs is for the use of formatting a partition as the f2fs filesystem,
+which builds a basic on-disk layout.
+
+The options consist of:
+
+===============    ===========================================================
+``-l [label]``     Give a volume label, up to 512 unicode name.
+``-a [0 or 1]``    Split start location of each area for heap-based allocation.
+
+                   1 is set by default, which performs this.
+``-o [int]``       Set overprovision ratio in percent over volume size.
+
+                   5 is set by default.
+``-s [int]``       Set the number of segments per section.
+
+                   1 is set by default.
+``-z [int]``       Set the number of sections per zone.
+
+                   1 is set by default.
+``-e [str]``       Set basic extension list. e.g. "mp3,gif,mov"
+``-t [0 or 1]``    Disable discard command or not.
+
+                   1 is set by default, which conducts discard.
+===============    ===========================================================
+
+fsck.f2fs
+---------
+The fsck.f2fs is a tool to check the consistency of an f2fs-formatted
+partition, which examines whether the filesystem metadata and user-made data
+are cross-referenced correctly or not.
+Note that, initial version of the tool does not fix any inconsistency.
+
+The options consist of::
+
+  -d debug level [default:0]
+
+dump.f2fs
+---------
+The dump.f2fs shows the information of specific inode and dumps SSA and SIT to
+file. Each file is dump_ssa and dump_sit.
+
+The dump.f2fs is used to debug on-disk data structures of the f2fs filesystem.
+It shows on-disk inode information recognized by a given inode number, and is
+able to dump all the SSA and SIT entries into predefined files, ./dump_ssa and
+./dump_sit respectively.
+
+The options consist of::
+
+  -d debug level [default:0]
+  -i inode no (hex)
+  -s [SIT dump segno from #1~#2 (decimal), for all 0~-1]
+  -a [SSA dump segno from #1~#2 (decimal), for all 0~-1]
+
+Examples::
+
+    # dump.f2fs -i [ino] /dev/sdx
+    # dump.f2fs -s 0~-1 /dev/sdx (SIT dump)
+    # dump.f2fs -a 0~-1 /dev/sdx (SSA dump)
+
+Design
+======
+
+On-disk Layout
+--------------
+
+F2FS divides the whole volume into a number of segments, each of which is fixed
+to 2MB in size. A section is composed of consecutive segments, and a zone
+consists of a set of sections. By default, section and zone sizes are set to one
+segment size identically, but users can easily modify the sizes by mkfs.
+
+F2FS splits the entire volume into six areas, and all the areas except superblock
+consists of multiple segments as described below::
+
+                                            align with the zone size <-|
+                 |-> align with the segment size
+     _________________________________________________________________________
+    |            |            |   Segment   |    Node     |   Segment  |      |
+    | Superblock | Checkpoint |    Info.    |   Address   |   Summary  | Main |
+    |    (SB)    |   (CP)     | Table (SIT) | Table (NAT) | Area (SSA) |      |
+    |____________|_____2______|______N______|______N______|______N_____|__N___|
+                                                                       .      .
+                                                             .                .
+                                                 .                            .
+                                    ._________________________________________.
+                                    |_Segment_|_..._|_Segment_|_..._|_Segment_|
+                                    .           .
+                                    ._________._________
+                                    |_section_|__...__|_
+                                    .            .
+		                    .________.
+	                            |__zone__|
+
+- Superblock (SB)
+   It is located at the beginning of the partition, and there exist two copies
+   to avoid file system crash. It contains basic partition information and some
+   default parameters of f2fs.
+
+- Checkpoint (CP)
+   It contains file system information, bitmaps for valid NAT/SIT sets, orphan
+   inode lists, and summary entries of current active segments.
+
+- Segment Information Table (SIT)
+   It contains segment information such as valid block count and bitmap for the
+   validity of all the blocks.
+
+- Node Address Table (NAT)
+   It is composed of a block address table for all the node blocks stored in
+   Main area.
+
+- Segment Summary Area (SSA)
+   It contains summary entries which contains the owner information of all the
+   data and node blocks stored in Main area.
+
+- Main Area
+   It contains file and directory data including their indices.
+
+In order to avoid misalignment between file system and flash-based storage, F2FS
+aligns the start block address of CP with the segment size. Also, it aligns the
+start block address of Main area with the zone size by reserving some segments
+in SSA area.
+
+Reference the following survey for additional technical details.
+https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey
+
+File System Metadata Structure
+------------------------------
+
+F2FS adopts the checkpointing scheme to maintain file system consistency. At
+mount time, F2FS first tries to find the last valid checkpoint data by scanning
+CP area. In order to reduce the scanning time, F2FS uses only two copies of CP.
+One of them always indicates the last valid data, which is called as shadow copy
+mechanism. In addition to CP, NAT and SIT also adopt the shadow copy mechanism.
+
+For file system consistency, each CP points to which NAT and SIT copies are
+valid, as shown as below::
+
+  +--------+----------+---------+
+  |   CP   |    SIT   |   NAT   |
+  +--------+----------+---------+
+  .         .          .          .
+  .            .              .              .
+  .               .                 .                 .
+  +-------+-------+--------+--------+--------+--------+
+  | CP #0 | CP #1 | SIT #0 | SIT #1 | NAT #0 | NAT #1 |
+  +-------+-------+--------+--------+--------+--------+
+     |             ^                          ^
+     |             |                          |
+     `----------------------------------------'
+
+Index Structure
+---------------
+
+The key data structure to manage the data locations is a "node". Similar to
+traditional file structures, F2FS has three types of node: inode, direct node,
+indirect node. F2FS assigns 4KB to an inode block which contains 923 data block
+indices, two direct node pointers, two indirect node pointers, and one double
+indirect node pointer as described below. One direct node block contains 1018
+data blocks, and one indirect node block contains also 1018 node blocks. Thus,
+one inode block (i.e., a file) covers::
+
+  4KB * (923 + 2 * 1018 + 2 * 1018 * 1018 + 1018 * 1018 * 1018) := 3.94TB.
+
+   Inode block (4KB)
+     |- data (923)
+     |- direct node (2)
+     |          `- data (1018)
+     |- indirect node (2)
+     |            `- direct node (1018)
+     |                       `- data (1018)
+     `- double indirect node (1)
+                         `- indirect node (1018)
+			              `- direct node (1018)
+	                                         `- data (1018)
+
+Note that, all the node blocks are mapped by NAT which means the location of
+each node is translated by the NAT table. In the consideration of the wandering
+tree problem, F2FS is able to cut off the propagation of node updates caused by
+leaf data writes.
+
+Directory Structure
+-------------------
+
+A directory entry occupies 11 bytes, which consists of the following attributes.
+
+- hash		hash value of the file name
+- ino		inode number
+- len		the length of file name
+- type		file type such as directory, symlink, etc
+
+A dentry block consists of 214 dentry slots and file names. Therein a bitmap is
+used to represent whether each dentry is valid or not. A dentry block occupies
+4KB with the following composition.
+
+::
+
+  Dentry Block(4 K) = bitmap (27 bytes) + reserved (3 bytes) +
+	              dentries(11 * 214 bytes) + file name (8 * 214 bytes)
+
+                         [Bucket]
+             +--------------------------------+
+             |dentry block 1 | dentry block 2 |
+             +--------------------------------+
+             .               .
+       .                             .
+  .       [Dentry Block Structure: 4KB]       .
+  +--------+----------+----------+------------+
+  | bitmap | reserved | dentries | file names |
+  +--------+----------+----------+------------+
+  [Dentry Block: 4KB] .   .
+		 .               .
+            .                          .
+            +------+------+-----+------+
+            | hash | ino  | len | type |
+            +------+------+-----+------+
+            [Dentry Structure: 11 bytes]
+
+F2FS implements multi-level hash tables for directory structure. Each level has
+a hash table with dedicated number of hash buckets as shown below. Note that
+"A(2B)" means a bucket includes 2 data blocks.
+
+::
+
+    ----------------------
+    A : bucket
+    B : block
+    N : MAX_DIR_HASH_DEPTH
+    ----------------------
+
+    level #0   | A(2B)
+	    |
+    level #1   | A(2B) - A(2B)
+	    |
+    level #2   | A(2B) - A(2B) - A(2B) - A(2B)
+	.     |   .       .       .       .
+    level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
+	.     |   .       .       .       .
+    level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
+
+The number of blocks and buckets are determined by::
+
+                            ,- 2, if n < MAX_DIR_HASH_DEPTH / 2,
+  # of blocks in level #n = |
+                            `- 4, Otherwise
+
+                             ,- 2^(n + dir_level),
+			     |        if n + dir_level < MAX_DIR_HASH_DEPTH / 2,
+  # of buckets in level #n = |
+                             `- 2^((MAX_DIR_HASH_DEPTH / 2) - 1),
+			              Otherwise
+
+When F2FS finds a file name in a directory, at first a hash value of the file
+name is calculated. Then, F2FS scans the hash table in level #0 to find the
+dentry consisting of the file name and its inode number. If not found, F2FS
+scans the next hash table in level #1. In this way, F2FS scans hash tables in
+each levels incrementally from 1 to N. In each levels F2FS needs to scan only
+one bucket determined by the following equation, which shows O(log(# of files))
+complexity::
+
+  bucket number to scan in level #n = (hash value) % (# of buckets in level #n)
+
+In the case of file creation, F2FS finds empty consecutive slots that cover the
+file name. F2FS searches the empty slots in the hash tables of whole levels from
+1 to N in the same way as the lookup operation.
+
+The following figure shows an example of two cases holding children::
+
+       --------------> Dir <--------------
+       |                                 |
+    child                             child
+
+    child - child                     [hole] - child
+
+    child - child - child             [hole] - [hole] - child
+
+   Case 1:                           Case 2:
+   Number of children = 6,           Number of children = 3,
+   File size = 7                     File size = 7
+
+Default Block Allocation
+------------------------
+
+At runtime, F2FS manages six active logs inside "Main" area: Hot/Warm/Cold node
+and Hot/Warm/Cold data.
+
+- Hot node	contains direct node blocks of directories.
+- Warm node	contains direct node blocks except hot node blocks.
+- Cold node	contains indirect node blocks
+- Hot data	contains dentry blocks
+- Warm data	contains data blocks except hot and cold data blocks
+- Cold data	contains multimedia data or migrated data blocks
+
+LFS has two schemes for free space management: threaded log and copy-and-compac-
+tion. The copy-and-compaction scheme which is known as cleaning, is well-suited
+for devices showing very good sequential write performance, since free segments
+are served all the time for writing new data. However, it suffers from cleaning
+overhead under high utilization. Contrarily, the threaded log scheme suffers
+from random writes, but no cleaning process is needed. F2FS adopts a hybrid
+scheme where the copy-and-compaction scheme is adopted by default, but the
+policy is dynamically changed to the threaded log scheme according to the file
+system status.
+
+In order to align F2FS with underlying flash-based storage, F2FS allocates a
+segment in a unit of section. F2FS expects that the section size would be the
+same as the unit size of garbage collection in FTL. Furthermore, with respect
+to the mapping granularity in FTL, F2FS allocates each section of the active
+logs from different zones as much as possible, since FTL can write the data in
+the active logs into one allocation unit according to its mapping granularity.
+
+Cleaning process
+----------------
+
+F2FS does cleaning both on demand and in the background. On-demand cleaning is
+triggered when there are not enough free segments to serve VFS calls. Background
+cleaner is operated by a kernel thread, and triggers the cleaning job when the
+system is idle.
+
+F2FS supports two victim selection policies: greedy and cost-benefit algorithms.
+In the greedy algorithm, F2FS selects a victim segment having the smallest number
+of valid blocks. In the cost-benefit algorithm, F2FS selects a victim segment
+according to the segment age and the number of valid blocks in order to address
+log block thrashing problem in the greedy algorithm. F2FS adopts the greedy
+algorithm for on-demand cleaner, while background cleaner adopts cost-benefit
+algorithm.
+
+In order to identify whether the data in the victim segment are valid or not,
+F2FS manages a bitmap. Each bit represents the validity of a block, and the
+bitmap is composed of a bit stream covering whole blocks in main area.
+
+Write-hint Policy
+-----------------
+
+1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
+
+2) whint_mode=user-based. F2FS tries to pass down hints given by
+users.
+
+===================== ======================== ===================
+User                  F2FS                     Block
+===================== ======================== ===================
+                      META                     WRITE_LIFE_NOT_SET
+                      HOT_NODE                 "
+                      WARM_NODE                "
+                      COLD_NODE                "
+ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
+extension list        "                        "
+
+-- buffered io
+WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
+WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
+WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
+WRITE_LIFE_NONE       "                        "
+WRITE_LIFE_MEDIUM     "                        "
+WRITE_LIFE_LONG       "                        "
+
+-- direct io
+WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
+WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
+WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
+WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
+WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
+WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG
+===================== ======================== ===================
+
+3) whint_mode=fs-based. F2FS passes down hints with its policy.
+
+===================== ======================== ===================
+User                  F2FS                     Block
+===================== ======================== ===================
+                      META                     WRITE_LIFE_MEDIUM;
+                      HOT_NODE                 WRITE_LIFE_NOT_SET
+                      WARM_NODE                "
+                      COLD_NODE                WRITE_LIFE_NONE
+ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
+extension list        "                        "
+
+-- buffered io
+WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
+WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
+WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_LONG
+WRITE_LIFE_NONE       "                        "
+WRITE_LIFE_MEDIUM     "                        "
+WRITE_LIFE_LONG       "                        "
+
+-- direct io
+WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
+WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
+WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
+WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
+WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
+WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG
+===================== ======================== ===================
+
+Fallocate(2) Policy
+-------------------
+
+The default policy follows the below posix rule.
+
+Allocating disk space
+    The default operation (i.e., mode is zero) of fallocate() allocates
+    the disk space within the range specified by offset and len.  The
+    file size (as reported by stat(2)) will be changed if offset+len is
+    greater than the file size.  Any subregion within the range specified
+    by offset and len that did not contain data before the call will be
+    initialized to zero.  This default behavior closely resembles the
+    behavior of the posix_fallocate(3) library function, and is intended
+    as a method of optimally implementing that function.
+
+However, once F2FS receives ioctl(fd, F2FS_IOC_SET_PIN_FILE) in prior to
+fallocate(fd, DEFAULT_MODE), it allocates on-disk blocks addressess having
+zero or random data, which is useful to the below scenario where:
+
+ 1. create(fd)
+ 2. ioctl(fd, F2FS_IOC_SET_PIN_FILE)
+ 3. fallocate(fd, 0, 0, size)
+ 4. address = fibmap(fd, offset)
+ 5. open(blkdev)
+ 6. write(blkdev, address)
+
+Compression implementation
+--------------------------
+
+- New term named cluster is defined as basic unit of compression, file can
+  be divided into multiple clusters logically. One cluster includes 4 << n
+  (n >= 0) logical pages, compression size is also cluster size, each of
+  cluster can be compressed or not.
+
+- In cluster metadata layout, one special block address is used to indicate
+  cluster is compressed one or normal one, for compressed cluster, following
+  metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs
+  stores data including compress header and compressed data.
+
+- In order to eliminate write amplification during overwrite, F2FS only
+  support compression on write-once file, data can be compressed only when
+  all logical blocks in file are valid and cluster compress ratio is lower
+  than specified threshold.
+
+- To enable compression on regular inode, there are three ways:
+
+  * chattr +c file
+  * chattr +c dir; touch dir/file
+  * mount w/ -o compress_extension=ext; touch file.ext
+
+Compress metadata layout::
+
+				[Dnode Structure]
+		+-----------------------------------------------+
+		| cluster 1 | cluster 2 | ......... | cluster N |
+		+-----------------------------------------------+
+		.           .                       .           .
+	.                       .                .                      .
+    .         Compressed Cluster       .        .        Normal Cluster            .
+    +----------+---------+---------+---------+  +---------+---------+---------+---------+
+    |compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | block 4 |
+    +----------+---------+---------+---------+  +---------+---------+---------+---------+
+	    .                             .
+	    .                                           .
+	.                                                           .
+	+-------------+-------------+----------+----------------------------+
+	| data length | data chksum | reserved |      compressed data       |
+	+-------------+-------------+----------+----------------------------+
diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt
deleted file mode 100644
index 4eb3e2ddd00e..000000000000
--- a/Documentation/filesystems/f2fs.txt
+++ /dev/null
@@ -1,730 +0,0 @@
-================================================================================
-WHAT IS Flash-Friendly File System (F2FS)?
-================================================================================
-
-NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
-been equipped on a variety systems ranging from mobile to server systems. Since
-they are known to have different characteristics from the conventional rotating
-disks, a file system, an upper layer to the storage device, should adapt to the
-changes from the sketch in the design level.
-
-F2FS is a file system exploiting NAND flash memory-based storage devices, which
-is based on Log-structured File System (LFS). The design has been focused on
-addressing the fundamental issues in LFS, which are snowball effect of wandering
-tree and high cleaning overhead.
-
-Since a NAND flash memory-based storage device shows different characteristic
-according to its internal geometry or flash memory management scheme, namely FTL,
-F2FS and its tools support various parameters not only for configuring on-disk
-layout, but also for selecting allocation and cleaning algorithms.
-
-The following git tree provides the file system formatting tool (mkfs.f2fs),
-a consistency checking tool (fsck.f2fs), and a debugging tool (dump.f2fs).
->> git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git
-
-For reporting bugs and sending patches, please use the following mailing list:
->> linux-f2fs-devel@lists.sourceforge.net
-
-================================================================================
-BACKGROUND AND DESIGN ISSUES
-================================================================================
-
-Log-structured File System (LFS)
---------------------------------
-"A log-structured file system writes all modifications to disk sequentially in
-a log-like structure, thereby speeding up  both file writing and crash recovery.
-The log is the only structure on disk; it contains indexing information so that
-files can be read back from the log efficiently. In order to maintain large free
-areas on disk for fast writing, we divide  the log into segments and use a
-segment cleaner to compress the live information from heavily fragmented
-segments." from Rosenblum, M. and Ousterhout, J. K., 1992, "The design and
-implementation of a log-structured file system", ACM Trans. Computer Systems
-10, 1, 26–52.
-
-Wandering Tree Problem
-----------------------
-In LFS, when a file data is updated and written to the end of log, its direct
-pointer block is updated due to the changed location. Then the indirect pointer
-block is also updated due to the direct pointer block update. In this manner,
-the upper index structures such as inode, inode map, and checkpoint block are
-also updated recursively. This problem is called as wandering tree problem [1],
-and in order to enhance the performance, it should eliminate or relax the update
-propagation as much as possible.
-
-[1] Bityutskiy, A. 2005. JFFS3 design issues. http://www.linux-mtd.infradead.org/
-
-Cleaning Overhead
------------------
-Since LFS is based on out-of-place writes, it produces so many obsolete blocks
-scattered across the whole storage. In order to serve new empty log space, it
-needs to reclaim these obsolete blocks seamlessly to users. This job is called
-as a cleaning process.
-
-The process consists of three operations as follows.
-1. A victim segment is selected through referencing segment usage table.
-2. It loads parent index structures of all the data in the victim identified by
-   segment summary blocks.
-3. It checks the cross-reference between the data and its parent index structure.
-4. It moves valid data selectively.
-
-This cleaning job may cause unexpected long delays, so the most important goal
-is to hide the latencies to users. And also definitely, it should reduce the
-amount of valid data to be moved, and move them quickly as well.
-
-================================================================================
-KEY FEATURES
-================================================================================
-
-Flash Awareness
----------------
-- Enlarge the random write area for better performance, but provide the high
-  spatial locality
-- Align FS data structures to the operational units in FTL as best efforts
-
-Wandering Tree Problem
-----------------------
-- Use a term, “node”, that represents inodes as well as various pointer blocks
-- Introduce Node Address Table (NAT) containing the locations of all the “node”
-  blocks; this will cut off the update propagation.
-
-Cleaning Overhead
------------------
-- Support a background cleaning process
-- Support greedy and cost-benefit algorithms for victim selection policies
-- Support multi-head logs for static/dynamic hot and cold data separation
-- Introduce adaptive logging for efficient block allocation
-
-================================================================================
-MOUNT OPTIONS
-================================================================================
-
-background_gc=%s       Turn on/off cleaning operations, namely garbage
-                       collection, triggered in background when I/O subsystem is
-                       idle. If background_gc=on, it will turn on the garbage
-                       collection and if background_gc=off, garbage collection
-                       will be turned off. If background_gc=sync, it will turn
-                       on synchronous garbage collection running in background.
-                       Default value for this option is on. So garbage
-                       collection is on by default.
-disable_roll_forward   Disable the roll-forward recovery routine
-norecovery             Disable the roll-forward recovery routine, mounted read-
-                       only (i.e., -o ro,disable_roll_forward)
-discard/nodiscard      Enable/disable real-time discard in f2fs, if discard is
-                       enabled, f2fs will issue discard/TRIM commands when a
-		       segment is cleaned.
-no_heap                Disable heap-style segment allocation which finds free
-                       segments for data from the beginning of main area, while
-		       for node from the end of main area.
-nouser_xattr           Disable Extended User Attributes. Note: xattr is enabled
-                       by default if CONFIG_F2FS_FS_XATTR is selected.
-noacl                  Disable POSIX Access Control List. Note: acl is enabled
-                       by default if CONFIG_F2FS_FS_POSIX_ACL is selected.
-active_logs=%u         Support configuring the number of active logs. In the
-                       current design, f2fs supports only 2, 4, and 6 logs.
-                       Default number is 6.
-disable_ext_identify   Disable the extension list configured by mkfs, so f2fs
-                       does not aware of cold files such as media files.
-inline_xattr           Enable the inline xattrs feature.
-noinline_xattr         Disable the inline xattrs feature.
-inline_xattr_size=%u   Support configuring inline xattr size, it depends on
-		       flexible inline xattr feature.
-inline_data            Enable the inline data feature: New created small(<~3.4k)
-                       files can be written into inode block.
-inline_dentry          Enable the inline dir feature: data in new created
-                       directory entries can be written into inode block. The
-                       space of inode block which is used to store inline
-                       dentries is limited to ~3.4k.
-noinline_dentry        Disable the inline dentry feature.
-flush_merge	       Merge concurrent cache_flush commands as much as possible
-                       to eliminate redundant command issues. If the underlying
-		       device handles the cache_flush command relatively slowly,
-		       recommend to enable this option.
-nobarrier              This option can be used if underlying storage guarantees
-                       its cached data should be written to the novolatile area.
-		       If this option is set, no cache_flush commands are issued
-		       but f2fs still guarantees the write ordering of all the
-		       data writes.
-fastboot               This option is used when a system wants to reduce mount
-                       time as much as possible, even though normal performance
-		       can be sacrificed.
-extent_cache           Enable an extent cache based on rb-tree, it can cache
-                       as many as extent which map between contiguous logical
-                       address and physical address per inode, resulting in
-                       increasing the cache hit ratio. Set by default.
-noextent_cache         Disable an extent cache based on rb-tree explicitly, see
-                       the above extent_cache mount option.
-noinline_data          Disable the inline data feature, inline data feature is
-                       enabled by default.
-data_flush             Enable data flushing before checkpoint in order to
-                       persist data of regular and symlink.
-reserve_root=%d        Support configuring reserved space which is used for
-                       allocation from a privileged user with specified uid or
-                       gid, unit: 4KB, the default limit is 0.2% of user blocks.
-resuid=%d              The user ID which may use the reserved blocks.
-resgid=%d              The group ID which may use the reserved blocks.
-fault_injection=%d     Enable fault injection in all supported types with
-                       specified injection rate.
-fault_type=%d          Support configuring fault injection type, should be
-                       enabled with fault_injection option, fault type value
-                       is shown below, it supports single or combined type.
-                       Type_Name		Type_Value
-                       FAULT_KMALLOC		0x000000001
-                       FAULT_KVMALLOC		0x000000002
-                       FAULT_PAGE_ALLOC		0x000000004
-                       FAULT_PAGE_GET		0x000000008
-                       FAULT_ALLOC_BIO		0x000000010
-                       FAULT_ALLOC_NID		0x000000020
-                       FAULT_ORPHAN		0x000000040
-                       FAULT_BLOCK		0x000000080
-                       FAULT_DIR_DEPTH		0x000000100
-                       FAULT_EVICT_INODE	0x000000200
-                       FAULT_TRUNCATE		0x000000400
-                       FAULT_READ_IO		0x000000800
-                       FAULT_CHECKPOINT		0x000001000
-                       FAULT_DISCARD		0x000002000
-                       FAULT_WRITE_IO		0x000004000
-mode=%s                Control block allocation mode which supports "adaptive"
-                       and "lfs". In "lfs" mode, there should be no random
-                       writes towards main area.
-io_bits=%u             Set the bit size of write IO requests. It should be set
-                       with "mode=lfs".
-usrquota               Enable plain user disk quota accounting.
-grpquota               Enable plain group disk quota accounting.
-prjquota               Enable plain project quota accounting.
-usrjquota=<file>       Appoint specified file and type during mount, so that quota
-grpjquota=<file>       information can be properly updated during recovery flow,
-prjjquota=<file>       <quota file>: must be in root directory;
-jqfmt=<quota type>     <quota type>: [vfsold,vfsv0,vfsv1].
-offusrjquota           Turn off user journelled quota.
-offgrpjquota           Turn off group journelled quota.
-offprjjquota           Turn off project journelled quota.
-quota                  Enable plain user disk quota accounting.
-noquota                Disable all plain disk quota option.
-whint_mode=%s          Control which write hints are passed down to block
-                       layer. This supports "off", "user-based", and
-                       "fs-based".  In "off" mode (default), f2fs does not pass
-                       down hints. In "user-based" mode, f2fs tries to pass
-                       down hints given by users. And in "fs-based" mode, f2fs
-                       passes down hints with its policy.
-alloc_mode=%s          Adjust block allocation policy, which supports "reuse"
-                       and "default".
-fsync_mode=%s          Control the policy of fsync. Currently supports "posix",
-                       "strict", and "nobarrier". In "posix" mode, which is
-                       default, fsync will follow POSIX semantics and does a
-                       light operation to improve the filesystem performance.
-                       In "strict" mode, fsync will be heavy and behaves in line
-                       with xfs, ext4 and btrfs, where xfstest generic/342 will
-                       pass, but the performance will regress. "nobarrier" is
-                       based on "posix", but doesn't issue flush command for
-                       non-atomic files likewise "nobarrier" mount option.
-test_dummy_encryption  Enable dummy encryption, which provides a fake fscrypt
-                       context. The fake fscrypt context is used by xfstests.
-checkpoint=%s[:%u[%]]     Set to "disable" to turn off checkpointing. Set to "enable"
-                       to reenable checkpointing. Is enabled by default. While
-                       disabled, any unmounting or unexpected shutdowns will cause
-                       the filesystem contents to appear as they did when the
-                       filesystem was mounted with that option.
-                       While mounting with checkpoint=disabled, the filesystem must
-                       run garbage collection to ensure that all available space can
-                       be used. If this takes too much time, the mount may return
-                       EAGAIN. You may optionally add a value to indicate how much
-                       of the disk you would be willing to temporarily give up to
-                       avoid additional garbage collection. This can be given as a
-                       number of blocks, or as a percent. For instance, mounting
-                       with checkpoint=disable:100% would always succeed, but it may
-                       hide up to all remaining free space. The actual space that
-                       would be unusable can be viewed at /sys/fs/f2fs/<disk>/unusable
-                       This space is reclaimed once checkpoint=enable.
-compress_algorithm=%s  Control compress algorithm, currently f2fs supports "lzo"
-                       and "lz4" algorithm.
-compress_log_size=%u   Support configuring compress cluster size, the size will
-                       be 4KB * (1 << %u), 16KB is minimum size, also it's
-                       default size.
-compress_extension=%s  Support adding specified extension, so that f2fs can enable
-                       compression on those corresponding files, e.g. if all files
-                       with '.ext' has high compression rate, we can set the '.ext'
-                       on compression extension list and enable compression on
-                       these file by default rather than to enable it via ioctl.
-                       For other files, we can still enable compression via ioctl.
-
-================================================================================
-DEBUGFS ENTRIES
-================================================================================
-
-/sys/kernel/debug/f2fs/ contains information about all the partitions mounted as
-f2fs. Each file shows the whole f2fs information.
-
-/sys/kernel/debug/f2fs/status includes:
- - major file system information managed by f2fs currently
- - average SIT information about whole segments
- - current memory footprint consumed by f2fs.
-
-================================================================================
-SYSFS ENTRIES
-================================================================================
-
-Information about mounted f2fs file systems can be found in
-/sys/fs/f2fs.  Each mounted filesystem will have a directory in
-/sys/fs/f2fs based on its device name (i.e., /sys/fs/f2fs/sda).
-The files in each per-device directory are shown in table below.
-
-Files in /sys/fs/f2fs/<devname>
-(see also Documentation/ABI/testing/sysfs-fs-f2fs)
-
-================================================================================
-USAGE
-================================================================================
-
-1. Download userland tools and compile them.
-
-2. Skip, if f2fs was compiled statically inside kernel.
-   Otherwise, insert the f2fs.ko module.
- # insmod f2fs.ko
-
-3. Create a directory trying to mount
- # mkdir /mnt/f2fs
-
-4. Format the block device, and then mount as f2fs
- # mkfs.f2fs -l label /dev/block_device
- # mount -t f2fs /dev/block_device /mnt/f2fs
-
-mkfs.f2fs
----------
-The mkfs.f2fs is for the use of formatting a partition as the f2fs filesystem,
-which builds a basic on-disk layout.
-
-The options consist of:
--l [label]   : Give a volume label, up to 512 unicode name.
--a [0 or 1]  : Split start location of each area for heap-based allocation.
-               1 is set by default, which performs this.
--o [int]     : Set overprovision ratio in percent over volume size.
-               5 is set by default.
--s [int]     : Set the number of segments per section.
-               1 is set by default.
--z [int]     : Set the number of sections per zone.
-               1 is set by default.
--e [str]     : Set basic extension list. e.g. "mp3,gif,mov"
--t [0 or 1]  : Disable discard command or not.
-               1 is set by default, which conducts discard.
-
-fsck.f2fs
----------
-The fsck.f2fs is a tool to check the consistency of an f2fs-formatted
-partition, which examines whether the filesystem metadata and user-made data
-are cross-referenced correctly or not.
-Note that, initial version of the tool does not fix any inconsistency.
-
-The options consist of:
-  -d debug level [default:0]
-
-dump.f2fs
----------
-The dump.f2fs shows the information of specific inode and dumps SSA and SIT to
-file. Each file is dump_ssa and dump_sit.
-
-The dump.f2fs is used to debug on-disk data structures of the f2fs filesystem.
-It shows on-disk inode information recognized by a given inode number, and is
-able to dump all the SSA and SIT entries into predefined files, ./dump_ssa and
-./dump_sit respectively.
-
-The options consist of:
-  -d debug level [default:0]
-  -i inode no (hex)
-  -s [SIT dump segno from #1~#2 (decimal), for all 0~-1]
-  -a [SSA dump segno from #1~#2 (decimal), for all 0~-1]
-
-Examples:
-# dump.f2fs -i [ino] /dev/sdx
-# dump.f2fs -s 0~-1 /dev/sdx (SIT dump)
-# dump.f2fs -a 0~-1 /dev/sdx (SSA dump)
-
-================================================================================
-DESIGN
-================================================================================
-
-On-disk Layout
---------------
-
-F2FS divides the whole volume into a number of segments, each of which is fixed
-to 2MB in size. A section is composed of consecutive segments, and a zone
-consists of a set of sections. By default, section and zone sizes are set to one
-segment size identically, but users can easily modify the sizes by mkfs.
-
-F2FS splits the entire volume into six areas, and all the areas except superblock
-consists of multiple segments as described below.
-
-                                            align with the zone size <-|
-                 |-> align with the segment size
-     _________________________________________________________________________
-    |            |            |   Segment   |    Node     |   Segment  |      |
-    | Superblock | Checkpoint |    Info.    |   Address   |   Summary  | Main |
-    |    (SB)    |   (CP)     | Table (SIT) | Table (NAT) | Area (SSA) |      |
-    |____________|_____2______|______N______|______N______|______N_____|__N___|
-                                                                       .      .
-                                                             .                .
-                                                 .                            .
-                                    ._________________________________________.
-                                    |_Segment_|_..._|_Segment_|_..._|_Segment_|
-                                    .           .
-                                    ._________._________
-                                    |_section_|__...__|_
-                                    .            .
-		                    .________.
-	                            |__zone__|
-
-- Superblock (SB)
- : It is located at the beginning of the partition, and there exist two copies
-   to avoid file system crash. It contains basic partition information and some
-   default parameters of f2fs.
-
-- Checkpoint (CP)
- : It contains file system information, bitmaps for valid NAT/SIT sets, orphan
-   inode lists, and summary entries of current active segments.
-
-- Segment Information Table (SIT)
- : It contains segment information such as valid block count and bitmap for the
-   validity of all the blocks.
-
-- Node Address Table (NAT)
- : It is composed of a block address table for all the node blocks stored in
-   Main area.
-
-- Segment Summary Area (SSA)
- : It contains summary entries which contains the owner information of all the
-   data and node blocks stored in Main area.
-
-- Main Area
- : It contains file and directory data including their indices.
-
-In order to avoid misalignment between file system and flash-based storage, F2FS
-aligns the start block address of CP with the segment size. Also, it aligns the
-start block address of Main area with the zone size by reserving some segments
-in SSA area.
-
-Reference the following survey for additional technical details.
-https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey
-
-File System Metadata Structure
-------------------------------
-
-F2FS adopts the checkpointing scheme to maintain file system consistency. At
-mount time, F2FS first tries to find the last valid checkpoint data by scanning
-CP area. In order to reduce the scanning time, F2FS uses only two copies of CP.
-One of them always indicates the last valid data, which is called as shadow copy
-mechanism. In addition to CP, NAT and SIT also adopt the shadow copy mechanism.
-
-For file system consistency, each CP points to which NAT and SIT copies are
-valid, as shown as below.
-
-  +--------+----------+---------+
-  |   CP   |    SIT   |   NAT   |
-  +--------+----------+---------+
-  .         .          .          .
-  .            .              .              .
-  .               .                 .                 .
-  +-------+-------+--------+--------+--------+--------+
-  | CP #0 | CP #1 | SIT #0 | SIT #1 | NAT #0 | NAT #1 |
-  +-------+-------+--------+--------+--------+--------+
-     |             ^                          ^
-     |             |                          |
-     `----------------------------------------'
-
-Index Structure
----------------
-
-The key data structure to manage the data locations is a "node". Similar to
-traditional file structures, F2FS has three types of node: inode, direct node,
-indirect node. F2FS assigns 4KB to an inode block which contains 923 data block
-indices, two direct node pointers, two indirect node pointers, and one double
-indirect node pointer as described below. One direct node block contains 1018
-data blocks, and one indirect node block contains also 1018 node blocks. Thus,
-one inode block (i.e., a file) covers:
-
-  4KB * (923 + 2 * 1018 + 2 * 1018 * 1018 + 1018 * 1018 * 1018) := 3.94TB.
-
-   Inode block (4KB)
-     |- data (923)
-     |- direct node (2)
-     |          `- data (1018)
-     |- indirect node (2)
-     |            `- direct node (1018)
-     |                       `- data (1018)
-     `- double indirect node (1)
-                         `- indirect node (1018)
-			              `- direct node (1018)
-	                                         `- data (1018)
-
-Note that, all the node blocks are mapped by NAT which means the location of
-each node is translated by the NAT table. In the consideration of the wandering
-tree problem, F2FS is able to cut off the propagation of node updates caused by
-leaf data writes.
-
-Directory Structure
--------------------
-
-A directory entry occupies 11 bytes, which consists of the following attributes.
-
-- hash		hash value of the file name
-- ino		inode number
-- len		the length of file name
-- type		file type such as directory, symlink, etc
-
-A dentry block consists of 214 dentry slots and file names. Therein a bitmap is
-used to represent whether each dentry is valid or not. A dentry block occupies
-4KB with the following composition.
-
-  Dentry Block(4 K) = bitmap (27 bytes) + reserved (3 bytes) +
-	              dentries(11 * 214 bytes) + file name (8 * 214 bytes)
-
-                         [Bucket]
-             +--------------------------------+
-             |dentry block 1 | dentry block 2 |
-             +--------------------------------+
-             .               .
-       .                             .
-  .       [Dentry Block Structure: 4KB]       .
-  +--------+----------+----------+------------+
-  | bitmap | reserved | dentries | file names |
-  +--------+----------+----------+------------+
-  [Dentry Block: 4KB] .   .
-		 .               .
-            .                          .
-            +------+------+-----+------+
-            | hash | ino  | len | type |
-            +------+------+-----+------+
-            [Dentry Structure: 11 bytes]
-
-F2FS implements multi-level hash tables for directory structure. Each level has
-a hash table with dedicated number of hash buckets as shown below. Note that
-"A(2B)" means a bucket includes 2 data blocks.
-
-----------------------
-A : bucket
-B : block
-N : MAX_DIR_HASH_DEPTH
-----------------------
-
-level #0   | A(2B)
-           |
-level #1   | A(2B) - A(2B)
-           |
-level #2   | A(2B) - A(2B) - A(2B) - A(2B)
-     .     |   .       .       .       .
-level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
-     .     |   .       .       .       .
-level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
-
-The number of blocks and buckets are determined by,
-
-                            ,- 2, if n < MAX_DIR_HASH_DEPTH / 2,
-  # of blocks in level #n = |
-                            `- 4, Otherwise
-
-                             ,- 2^(n + dir_level),
-			     |        if n + dir_level < MAX_DIR_HASH_DEPTH / 2,
-  # of buckets in level #n = |
-                             `- 2^((MAX_DIR_HASH_DEPTH / 2) - 1),
-			              Otherwise
-
-When F2FS finds a file name in a directory, at first a hash value of the file
-name is calculated. Then, F2FS scans the hash table in level #0 to find the
-dentry consisting of the file name and its inode number. If not found, F2FS
-scans the next hash table in level #1. In this way, F2FS scans hash tables in
-each levels incrementally from 1 to N. In each levels F2FS needs to scan only
-one bucket determined by the following equation, which shows O(log(# of files))
-complexity.
-
-  bucket number to scan in level #n = (hash value) % (# of buckets in level #n)
-
-In the case of file creation, F2FS finds empty consecutive slots that cover the
-file name. F2FS searches the empty slots in the hash tables of whole levels from
-1 to N in the same way as the lookup operation.
-
-The following figure shows an example of two cases holding children.
-       --------------> Dir <--------------
-       |                                 |
-    child                             child
-
-    child - child                     [hole] - child
-
-    child - child - child             [hole] - [hole] - child
-
-   Case 1:                           Case 2:
-   Number of children = 6,           Number of children = 3,
-   File size = 7                     File size = 7
-
-Default Block Allocation
-------------------------
-
-At runtime, F2FS manages six active logs inside "Main" area: Hot/Warm/Cold node
-and Hot/Warm/Cold data.
-
-- Hot node	contains direct node blocks of directories.
-- Warm node	contains direct node blocks except hot node blocks.
-- Cold node	contains indirect node blocks
-- Hot data	contains dentry blocks
-- Warm data	contains data blocks except hot and cold data blocks
-- Cold data	contains multimedia data or migrated data blocks
-
-LFS has two schemes for free space management: threaded log and copy-and-compac-
-tion. The copy-and-compaction scheme which is known as cleaning, is well-suited
-for devices showing very good sequential write performance, since free segments
-are served all the time for writing new data. However, it suffers from cleaning
-overhead under high utilization. Contrarily, the threaded log scheme suffers
-from random writes, but no cleaning process is needed. F2FS adopts a hybrid
-scheme where the copy-and-compaction scheme is adopted by default, but the
-policy is dynamically changed to the threaded log scheme according to the file
-system status.
-
-In order to align F2FS with underlying flash-based storage, F2FS allocates a
-segment in a unit of section. F2FS expects that the section size would be the
-same as the unit size of garbage collection in FTL. Furthermore, with respect
-to the mapping granularity in FTL, F2FS allocates each section of the active
-logs from different zones as much as possible, since FTL can write the data in
-the active logs into one allocation unit according to its mapping granularity.
-
-Cleaning process
-----------------
-
-F2FS does cleaning both on demand and in the background. On-demand cleaning is
-triggered when there are not enough free segments to serve VFS calls. Background
-cleaner is operated by a kernel thread, and triggers the cleaning job when the
-system is idle.
-
-F2FS supports two victim selection policies: greedy and cost-benefit algorithms.
-In the greedy algorithm, F2FS selects a victim segment having the smallest number
-of valid blocks. In the cost-benefit algorithm, F2FS selects a victim segment
-according to the segment age and the number of valid blocks in order to address
-log block thrashing problem in the greedy algorithm. F2FS adopts the greedy
-algorithm for on-demand cleaner, while background cleaner adopts cost-benefit
-algorithm.
-
-In order to identify whether the data in the victim segment are valid or not,
-F2FS manages a bitmap. Each bit represents the validity of a block, and the
-bitmap is composed of a bit stream covering whole blocks in main area.
-
-Write-hint Policy
------------------
-
-1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
-
-2) whint_mode=user-based. F2FS tries to pass down hints given by
-users.
-
-User                  F2FS                     Block
-----                  ----                     -----
-                      META                     WRITE_LIFE_NOT_SET
-                      HOT_NODE                 "
-                      WARM_NODE                "
-                      COLD_NODE                "
-*ioctl(COLD)          COLD_DATA                WRITE_LIFE_EXTREME
-*extension list       "                        "
-
--- buffered io
-WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
-WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
-WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
-WRITE_LIFE_NONE       "                        "
-WRITE_LIFE_MEDIUM     "                        "
-WRITE_LIFE_LONG       "                        "
-
--- direct io
-WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
-WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
-WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
-WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
-WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
-WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG
-
-3) whint_mode=fs-based. F2FS passes down hints with its policy.
-
-User                  F2FS                     Block
-----                  ----                     -----
-                      META                     WRITE_LIFE_MEDIUM;
-                      HOT_NODE                 WRITE_LIFE_NOT_SET
-                      WARM_NODE                "
-                      COLD_NODE                WRITE_LIFE_NONE
-ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
-extension list        "                        "
-
--- buffered io
-WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
-WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
-WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_LONG
-WRITE_LIFE_NONE       "                        "
-WRITE_LIFE_MEDIUM     "                        "
-WRITE_LIFE_LONG       "                        "
-
--- direct io
-WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
-WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
-WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
-WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
-WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
-WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG
-
-Fallocate(2) Policy
--------------------
-
-The default policy follows the below posix rule.
-
-Allocating disk space
-    The default operation (i.e., mode is zero) of fallocate() allocates
-    the disk space within the range specified by offset and len.  The
-    file size (as reported by stat(2)) will be changed if offset+len is
-    greater than the file size.  Any subregion within the range specified
-    by offset and len that did not contain data before the call will be
-    initialized to zero.  This default behavior closely resembles the
-    behavior of the posix_fallocate(3) library function, and is intended
-    as a method of optimally implementing that function.
-
-However, once F2FS receives ioctl(fd, F2FS_IOC_SET_PIN_FILE) in prior to
-fallocate(fd, DEFAULT_MODE), it allocates on-disk blocks addressess having
-zero or random data, which is useful to the below scenario where:
- 1. create(fd)
- 2. ioctl(fd, F2FS_IOC_SET_PIN_FILE)
- 3. fallocate(fd, 0, 0, size)
- 4. address = fibmap(fd, offset)
- 5. open(blkdev)
- 6. write(blkdev, address)
-
-Compression implementation
---------------------------
-
-- New term named cluster is defined as basic unit of compression, file can
-be divided into multiple clusters logically. One cluster includes 4 << n
-(n >= 0) logical pages, compression size is also cluster size, each of
-cluster can be compressed or not.
-
-- In cluster metadata layout, one special block address is used to indicate
-cluster is compressed one or normal one, for compressed cluster, following
-metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs
-stores data including compress header and compressed data.
-
-- In order to eliminate write amplification during overwrite, F2FS only
-support compression on write-once file, data can be compressed only when
-all logical blocks in file are valid and cluster compress ratio is lower
-than specified threshold.
-
-- To enable compression on regular inode, there are three ways:
-* chattr +c file
-* chattr +c dir; touch dir/file
-* mount w/ -o compress_extension=ext; touch file.ext
-
-Compress metadata layout:
-                             [Dnode Structure]
-             +-----------------------------------------------+
-             | cluster 1 | cluster 2 | ......... | cluster N |
-             +-----------------------------------------------+
-             .           .                       .           .
-       .                       .                .                      .
-  .         Compressed Cluster       .        .        Normal Cluster            .
-+----------+---------+---------+---------+  +---------+---------+---------+---------+
-|compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | block 4 |
-+----------+---------+---------+---------+  +---------+---------+---------+---------+
-           .                             .
-         .                                           .
-       .                                                           .
-      +-------------+-------------+----------+----------------------------+
-      | data length | data chksum | reserved |      compressed data       |
-      +-------------+-------------+----------+----------------------------+
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index aa2c3d1de3de..f69d20406be0 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -64,6 +64,7 @@ Documentation for filesystem implementations.
    erofs
    ext2
    ext3
+   f2fs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 720c2fc1ec7cb36bfc5326603522bc3955534773 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:05 +0100
Subject: docs: filesystems: convert gfs2.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Bob Peterson <rpeterso@redhat.com>
Link: https://lore.kernel.org/r/6d7a296de025bcfed7a229da7f8cc1678944f304.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/gfs2.rst  | 53 +++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/gfs2.txt  | 45 -------------------------------
 Documentation/filesystems/index.rst |  1 +
 3 files changed, 54 insertions(+), 45 deletions(-)
 create mode 100644 Documentation/filesystems/gfs2.rst
 delete mode 100644 Documentation/filesystems/gfs2.txt

diff --git a/Documentation/filesystems/gfs2.rst b/Documentation/filesystems/gfs2.rst
new file mode 100644
index 000000000000..8d1ab589ce18
--- /dev/null
+++ b/Documentation/filesystems/gfs2.rst
@@ -0,0 +1,53 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================
+Global File System
+==================
+
+https://fedorahosted.org/cluster/wiki/HomePage
+
+GFS is a cluster file system. It allows a cluster of computers to
+simultaneously use a block device that is shared between them (with FC,
+iSCSI, NBD, etc).  GFS reads and writes to the block device like a local
+file system, but also uses a lock module to allow the computers coordinate
+their I/O so file system consistency is maintained.  One of the nifty
+features of GFS is perfect consistency -- changes made to the file system
+on one machine show up immediately on all other machines in the cluster.
+
+GFS uses interchangeable inter-node locking mechanisms, the currently
+supported mechanisms are:
+
+  lock_nolock
+    - allows gfs to be used as a local file system
+
+  lock_dlm
+    - uses a distributed lock manager (dlm) for inter-node locking.
+      The dlm is found at linux/fs/dlm/
+
+Lock_dlm depends on user space cluster management systems found
+at the URL above.
+
+To use gfs as a local file system, no external clustering systems are
+needed, simply::
+
+  $ mkfs -t gfs2 -p lock_nolock -j 1 /dev/block_device
+  $ mount -t gfs2 /dev/block_device /dir
+
+If you are using Fedora, you need to install the gfs2-utils package
+and, for lock_dlm, you will also need to install the cman package
+and write a cluster.conf as per the documentation. For F17 and above
+cman has been replaced by the dlm package.
+
+GFS2 is not on-disk compatible with previous versions of GFS, but it
+is pretty close.
+
+The following man pages can be found at the URL above:
+
+  ============		=============================================
+  fsck.gfs2		to repair a filesystem
+  gfs2_grow		to expand a filesystem online
+  gfs2_jadd		to add journals to a filesystem online
+  tunegfs2		to manipulate, examine and tune a filesystem
+  gfs2_convert		to convert a gfs filesystem to gfs2 in-place
+  mkfs.gfs2		to make a filesystem
+  ============		=============================================
diff --git a/Documentation/filesystems/gfs2.txt b/Documentation/filesystems/gfs2.txt
deleted file mode 100644
index cc4f2306609e..000000000000
--- a/Documentation/filesystems/gfs2.txt
+++ /dev/null
@@ -1,45 +0,0 @@
-Global File System
-------------------
-
-https://fedorahosted.org/cluster/wiki/HomePage
-
-GFS is a cluster file system. It allows a cluster of computers to
-simultaneously use a block device that is shared between them (with FC,
-iSCSI, NBD, etc).  GFS reads and writes to the block device like a local
-file system, but also uses a lock module to allow the computers coordinate
-their I/O so file system consistency is maintained.  One of the nifty
-features of GFS is perfect consistency -- changes made to the file system
-on one machine show up immediately on all other machines in the cluster.
-
-GFS uses interchangeable inter-node locking mechanisms, the currently
-supported mechanisms are:
-
-  lock_nolock -- allows gfs to be used as a local file system
-
-  lock_dlm -- uses a distributed lock manager (dlm) for inter-node locking
-  The dlm is found at linux/fs/dlm/
-
-Lock_dlm depends on user space cluster management systems found
-at the URL above.
-
-To use gfs as a local file system, no external clustering systems are
-needed, simply:
-
-  $ mkfs -t gfs2 -p lock_nolock -j 1 /dev/block_device
-  $ mount -t gfs2 /dev/block_device /dir
-
-If you are using Fedora, you need to install the gfs2-utils package
-and, for lock_dlm, you will also need to install the cman package
-and write a cluster.conf as per the documentation. For F17 and above
-cman has been replaced by the dlm package.
-
-GFS2 is not on-disk compatible with previous versions of GFS, but it
-is pretty close.
-
-The following man pages can be found at the URL above:
-  fsck.gfs2		to repair a filesystem
-  gfs2_grow		to expand a filesystem online
-  gfs2_jadd		to add journals to a filesystem online
-  tunegfs2		to manipulate, examine and tune a filesystem
-  gfs2_convert	to convert a gfs filesystem to gfs2 in-place
-  mkfs.gfs2		to make a filesystem
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index f69d20406be0..f24befe78326 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -65,6 +65,7 @@ Documentation for filesystem implementations.
    ext2
    ext3
    f2fs
+   gfs2
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 5b7ac27a6e2c54cc09f479b616f1076afeae3c1b Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:06 +0100
Subject: docs: filesystems: convert gfs2-uevents.txt to ReST

This document is almost in ReST format: all it needs is to have
the titles adjusted and add a SPDX header. In other words:

- Add a SPDX header;
- Add a document title;
- Adjust section titles;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Bob Peterson <rpeterso@redhat.com>
Link: https://lore.kernel.org/r/1d1c46b7e86bd0a18d9abbea0de0bc2be84e5e2b.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/gfs2-uevents.rst | 112 +++++++++++++++++++++++++++++
 Documentation/filesystems/gfs2-uevents.txt | 100 --------------------------
 Documentation/filesystems/index.rst        |   1 +
 3 files changed, 113 insertions(+), 100 deletions(-)
 create mode 100644 Documentation/filesystems/gfs2-uevents.rst
 delete mode 100644 Documentation/filesystems/gfs2-uevents.txt

diff --git a/Documentation/filesystems/gfs2-uevents.rst b/Documentation/filesystems/gfs2-uevents.rst
new file mode 100644
index 000000000000..f162a2c76c69
--- /dev/null
+++ b/Documentation/filesystems/gfs2-uevents.rst
@@ -0,0 +1,112 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+uevents and GFS2
+================
+
+During the lifetime of a GFS2 mount, a number of uevents are generated.
+This document explains what the events are and what they are used
+for (by gfs_controld in gfs2-utils).
+
+A list of GFS2 uevents
+======================
+
+1. ADD
+------
+
+The ADD event occurs at mount time. It will always be the first
+uevent generated by the newly created filesystem. If the mount
+is successful, an ONLINE uevent will follow.  If it is not successful
+then a REMOVE uevent will follow.
+
+The ADD uevent has two environment variables: SPECTATOR=[0|1]
+and RDONLY=[0|1] that specify the spectator status (a read-only mount
+with no journal assigned), and read-only (with journal assigned) status
+of the filesystem respectively.
+
+2. ONLINE
+---------
+
+The ONLINE uevent is generated after a successful mount or remount. It
+has the same environment variables as the ADD uevent. The ONLINE
+uevent, along with the two environment variables for spectator and
+RDONLY are a relatively recent addition (2.6.32-rc+) and will not
+be generated by older kernels.
+
+3. CHANGE
+---------
+
+The CHANGE uevent is used in two places. One is when reporting the
+successful mount of the filesystem by the first node (FIRSTMOUNT=Done).
+This is used as a signal by gfs_controld that it is then ok for other
+nodes in the cluster to mount the filesystem.
+
+The other CHANGE uevent is used to inform of the completion
+of journal recovery for one of the filesystems journals. It has
+two environment variables, JID= which specifies the journal id which
+has just been recovered, and RECOVERY=[Done|Failed] to indicate the
+success (or otherwise) of the operation. These uevents are generated
+for every journal recovered, whether it is during the initial mount
+process or as the result of gfs_controld requesting a specific journal
+recovery via the /sys/fs/gfs2/<fsname>/lock_module/recovery file.
+
+Because the CHANGE uevent was used (in early versions of gfs_controld)
+without checking the environment variables to discover the state, we
+cannot add any more functions to it without running the risk of
+someone using an older version of the user tools and breaking their
+cluster. For this reason the ONLINE uevent was used when adding a new
+uevent for a successful mount or remount.
+
+4. OFFLINE
+----------
+
+The OFFLINE uevent is only generated due to filesystem errors and is used
+as part of the "withdraw" mechanism. Currently this doesn't give any
+information about what the error is, which is something that needs to
+be fixed.
+
+5. REMOVE
+---------
+
+The REMOVE uevent is generated at the end of an unsuccessful mount
+or at the end of a umount of the filesystem. All REMOVE uevents will
+have been preceded by at least an ADD uevent for the same filesystem,
+and unlike the other uevents is generated automatically by the kernel's
+kobject subsystem.
+
+
+Information common to all GFS2 uevents (uevent environment variables)
+=====================================================================
+
+1. LOCKTABLE=
+--------------
+
+The LOCKTABLE is a string, as supplied on the mount command
+line (locktable=) or via fstab. It is used as a filesystem label
+as well as providing the information for a lock_dlm mount to be
+able to join the cluster.
+
+2. LOCKPROTO=
+-------------
+
+The LOCKPROTO is a string, and its value depends on what is set
+on the mount command line, or via fstab. It will be either
+lock_nolock or lock_dlm. In the future other lock managers
+may be supported.
+
+3. JOURNALID=
+-------------
+
+If a journal is in use by the filesystem (journals are not
+assigned for spectator mounts) then this will give the
+numeric journal id in all GFS2 uevents.
+
+4. UUID=
+--------
+
+With recent versions of gfs2-utils, mkfs.gfs2 writes a UUID
+into the filesystem superblock. If it exists, this will
+be included in every uevent relating to the filesystem.
+
+
+
diff --git a/Documentation/filesystems/gfs2-uevents.txt b/Documentation/filesystems/gfs2-uevents.txt
deleted file mode 100644
index 19a19ebebc34..000000000000
--- a/Documentation/filesystems/gfs2-uevents.txt
+++ /dev/null
@@ -1,100 +0,0 @@
-                              uevents and GFS2
-                             ==================
-
-During the lifetime of a GFS2 mount, a number of uevents are generated.
-This document explains what the events are and what they are used
-for (by gfs_controld in gfs2-utils).
-
-A list of GFS2 uevents
------------------------
-
-1. ADD
-
-The ADD event occurs at mount time. It will always be the first
-uevent generated by the newly created filesystem. If the mount
-is successful, an ONLINE uevent will follow.  If it is not successful
-then a REMOVE uevent will follow.
-
-The ADD uevent has two environment variables: SPECTATOR=[0|1]
-and RDONLY=[0|1] that specify the spectator status (a read-only mount
-with no journal assigned), and read-only (with journal assigned) status
-of the filesystem respectively.
-
-2. ONLINE
-
-The ONLINE uevent is generated after a successful mount or remount. It
-has the same environment variables as the ADD uevent. The ONLINE
-uevent, along with the two environment variables for spectator and
-RDONLY are a relatively recent addition (2.6.32-rc+) and will not
-be generated by older kernels.
-
-3. CHANGE
-
-The CHANGE uevent is used in two places. One is when reporting the
-successful mount of the filesystem by the first node (FIRSTMOUNT=Done).
-This is used as a signal by gfs_controld that it is then ok for other
-nodes in the cluster to mount the filesystem.
-
-The other CHANGE uevent is used to inform of the completion
-of journal recovery for one of the filesystems journals. It has
-two environment variables, JID= which specifies the journal id which
-has just been recovered, and RECOVERY=[Done|Failed] to indicate the
-success (or otherwise) of the operation. These uevents are generated
-for every journal recovered, whether it is during the initial mount
-process or as the result of gfs_controld requesting a specific journal
-recovery via the /sys/fs/gfs2/<fsname>/lock_module/recovery file.
-
-Because the CHANGE uevent was used (in early versions of gfs_controld)
-without checking the environment variables to discover the state, we
-cannot add any more functions to it without running the risk of
-someone using an older version of the user tools and breaking their
-cluster. For this reason the ONLINE uevent was used when adding a new
-uevent for a successful mount or remount.
-
-4. OFFLINE
-
-The OFFLINE uevent is only generated due to filesystem errors and is used
-as part of the "withdraw" mechanism. Currently this doesn't give any
-information about what the error is, which is something that needs to
-be fixed.
-
-5. REMOVE
-
-The REMOVE uevent is generated at the end of an unsuccessful mount
-or at the end of a umount of the filesystem. All REMOVE uevents will
-have been preceded by at least an ADD uevent for the same filesystem,
-and unlike the other uevents is generated automatically by the kernel's
-kobject subsystem.
-
-
-Information common to all GFS2 uevents (uevent environment variables)
-----------------------------------------------------------------------
-
-1. LOCKTABLE=
-
-The LOCKTABLE is a string, as supplied on the mount command
-line (locktable=) or via fstab. It is used as a filesystem label
-as well as providing the information for a lock_dlm mount to be
-able to join the cluster.
-
-2. LOCKPROTO=
-
-The LOCKPROTO is a string, and its value depends on what is set
-on the mount command line, or via fstab. It will be either
-lock_nolock or lock_dlm. In the future other lock managers
-may be supported.
-
-3. JOURNALID=
-
-If a journal is in use by the filesystem (journals are not
-assigned for spectator mounts) then this will give the
-numeric journal id in all GFS2 uevents.
-
-4. UUID=
-
-With recent versions of gfs2-utils, mkfs.gfs2 writes a UUID
-into the filesystem superblock. If it exists, this will
-be included in every uevent relating to the filesystem.
-
-
-
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index f24befe78326..c16e517e37c5 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -66,6 +66,7 @@ Documentation for filesystem implementations.
    ext3
    f2fs
    gfs2
+   gfs2-uevents
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From cdded7db3625c98e66316911947bd3a1941992e2 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:07 +0100
Subject: docs: filesystems: convert hfsplus.txt to ReST

Just trivial changes:

- Add a SPDX header;
- Add it to filesystems/index.rst.

While here, adjust document title, just to make it use the same
style of the other docs.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/4298409da951fbee000201a6c8d9c85e961b2b79.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/hfsplus.rst | 61 +++++++++++++++++++++++++++++++++++
 Documentation/filesystems/hfsplus.txt | 59 ---------------------------------
 Documentation/filesystems/index.rst   |  1 +
 3 files changed, 62 insertions(+), 59 deletions(-)
 create mode 100644 Documentation/filesystems/hfsplus.rst
 delete mode 100644 Documentation/filesystems/hfsplus.txt

diff --git a/Documentation/filesystems/hfsplus.rst b/Documentation/filesystems/hfsplus.rst
new file mode 100644
index 000000000000..f02f4f5fc020
--- /dev/null
+++ b/Documentation/filesystems/hfsplus.rst
@@ -0,0 +1,61 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================
+Macintosh HFSPlus Filesystem for Linux
+======================================
+
+HFSPlus is a filesystem first introduced in MacOS 8.1.
+HFSPlus has several extensions to HFS, including 32-bit allocation
+blocks, 255-character unicode filenames, and file sizes of 2^63 bytes.
+
+
+Mount options
+=============
+
+When mounting an HFSPlus filesystem, the following options are accepted:
+
+  creator=cccc, type=cccc
+	Specifies the creator/type values as shown by the MacOS finder
+	used for creating new files.  Default values: '????'.
+
+  uid=n, gid=n
+	Specifies the user/group that owns all files on the filesystem
+	that have uninitialized permissions structures.
+	Default:  user/group id of the mounting process.
+
+  umask=n
+	Specifies the umask (in octal) used for files and directories
+	that have uninitialized permissions structures.
+	Default:  umask of the mounting process.
+
+  session=n
+	Select the CDROM session to mount as HFSPlus filesystem.  Defaults to
+	leaving that decision to the CDROM driver.  This option will fail
+	with anything but a CDROM as underlying devices.
+
+  part=n
+	Select partition number n from the devices.  This option only makes
+	sense for CDROMs because they can't be partitioned under Linux.
+	For disk devices the generic partition parsing code does this
+	for us.  Defaults to not parsing the partition table at all.
+
+  decompose
+	Decompose file name characters.
+
+  nodecompose
+	Do not decompose file name characters.
+
+  force
+	Used to force write access to volumes that are marked as journalled
+	or locked.  Use at your own risk.
+
+  nls=cccc
+	Encoding to use when presenting file names.
+
+
+References
+==========
+
+kernel source:		<file:fs/hfsplus>
+
+Apple Technote 1150	https://developer.apple.com/legacy/library/technotes/tn/tn1150.html
diff --git a/Documentation/filesystems/hfsplus.txt b/Documentation/filesystems/hfsplus.txt
deleted file mode 100644
index 59f7569fc9ed..000000000000
--- a/Documentation/filesystems/hfsplus.txt
+++ /dev/null
@@ -1,59 +0,0 @@
-
-Macintosh HFSPlus Filesystem for Linux
-======================================
-
-HFSPlus is a filesystem first introduced in MacOS 8.1.
-HFSPlus has several extensions to HFS, including 32-bit allocation
-blocks, 255-character unicode filenames, and file sizes of 2^63 bytes.
-
-
-Mount options
-=============
-
-When mounting an HFSPlus filesystem, the following options are accepted:
-
-  creator=cccc, type=cccc
-	Specifies the creator/type values as shown by the MacOS finder
-	used for creating new files.  Default values: '????'.
-
-  uid=n, gid=n
-	Specifies the user/group that owns all files on the filesystem
-	that have uninitialized permissions structures.
-	Default:  user/group id of the mounting process.
-
-  umask=n
-	Specifies the umask (in octal) used for files and directories
-	that have uninitialized permissions structures.
-	Default:  umask of the mounting process.
-
-  session=n
-	Select the CDROM session to mount as HFSPlus filesystem.  Defaults to
-	leaving that decision to the CDROM driver.  This option will fail
-	with anything but a CDROM as underlying devices.
-
-  part=n
-	Select partition number n from the devices.  This option only makes
-	sense for CDROMs because they can't be partitioned under Linux.
-	For disk devices the generic partition parsing code does this
-	for us.  Defaults to not parsing the partition table at all.
-
-  decompose
-	Decompose file name characters.
-
-  nodecompose
-	Do not decompose file name characters.
-
-  force
-	Used to force write access to volumes that are marked as journalled
-	or locked.  Use at your own risk.
-
-  nls=cccc
-	Encoding to use when presenting file names.
-
-
-References
-==========
-
-kernel source:		<file:fs/hfsplus>
-
-Apple Technote 1150	https://developer.apple.com/legacy/library/technotes/tn/tn1150.html
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index c16e517e37c5..c351bc8a8c85 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -67,6 +67,7 @@ Documentation for filesystem implementations.
    f2fs
    gfs2
    gfs2-uevents
+   hfsplus
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From 5040a0acc8f2300ef35a1d9cc1c50a25235e061d Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:08 +0100
Subject: docs: filesystems: convert hfs.txt to ReST

- Add a SPDX header;
- Adjust document and section titles;
- Use notes markups;
- Add lists markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/8a625d6652d88809730020048d26c3b9333ddbdf.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/hfs.rst   | 87 +++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/hfs.txt   | 82 ----------------------------------
 Documentation/filesystems/index.rst |  1 +
 3 files changed, 88 insertions(+), 82 deletions(-)
 create mode 100644 Documentation/filesystems/hfs.rst
 delete mode 100644 Documentation/filesystems/hfs.txt

diff --git a/Documentation/filesystems/hfs.rst b/Documentation/filesystems/hfs.rst
new file mode 100644
index 000000000000..ab17a005e9b1
--- /dev/null
+++ b/Documentation/filesystems/hfs.rst
@@ -0,0 +1,87 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
+Macintosh HFS Filesystem for Linux
+==================================
+
+
+.. Note:: This filesystem doesn't have a maintainer.
+
+
+HFS stands for ``Hierarchical File System`` and is the filesystem used
+by the Mac Plus and all later Macintosh models.  Earlier Macintosh
+models used MFS (``Macintosh File System``), which is not supported,
+MacOS 8.1 and newer support a filesystem called HFS+ that's similar to
+HFS but is extended in various areas.  Use the hfsplus filesystem driver
+to access such filesystems from Linux.
+
+
+Mount options
+=============
+
+When mounting an HFS filesystem, the following options are accepted:
+
+  creator=cccc, type=cccc
+	Specifies the creator/type values as shown by the MacOS finder
+	used for creating new files.  Default values: '????'.
+
+  uid=n, gid=n
+  	Specifies the user/group that owns all files on the filesystems.
+	Default:  user/group id of the mounting process.
+
+  dir_umask=n, file_umask=n, umask=n
+	Specifies the umask used for all files , all directories or all
+	files and directories.  Defaults to the umask of the mounting process.
+
+  session=n
+  	Select the CDROM session to mount as HFS filesystem.  Defaults to
+	leaving that decision to the CDROM driver.  This option will fail
+	with anything but a CDROM as underlying devices.
+
+  part=n
+  	Select partition number n from the devices.  Does only makes
+	sense for CDROMS because they can't be partitioned under Linux.
+	For disk devices the generic partition parsing code does this
+	for us.  Defaults to not parsing the partition table at all.
+
+  quiet
+  	Ignore invalid mount options instead of complaining.
+
+
+Writing to HFS Filesystems
+==========================
+
+HFS is not a UNIX filesystem, thus it does not have the usual features you'd
+expect:
+
+ * You can't modify the set-uid, set-gid, sticky or executable bits or the uid
+   and gid of files.
+ * You can't create hard- or symlinks, device files, sockets or FIFOs.
+
+HFS does on the other have the concepts of multiple forks per file.  These
+non-standard forks are represented as hidden additional files in the normal
+filesystems namespace which is kind of a cludge and makes the semantics for
+the a little strange:
+
+ * You can't create, delete or rename resource forks of files or the
+   Finder's metadata.
+ * They are however created (with default values), deleted and renamed
+   along with the corresponding data fork or directory.
+ * Copying files to a different filesystem will loose those attributes
+   that are essential for MacOS to work.
+
+
+Creating HFS filesystems
+========================
+
+The hfsutils package from Robert Leslie contains a program called
+hformat that can be used to create HFS filesystem. See
+<http://www.mars.org/home/rob/proj/hfs/> for details.
+
+
+Credits
+=======
+
+The HFS drivers was written by Paul H. Hargrovea (hargrove@sccm.Stanford.EDU).
+Roman Zippel (roman@ardistech.com) rewrote large parts of the code and brought
+in btree routines derived from Brad Boyer's hfsplus driver.
diff --git a/Documentation/filesystems/hfs.txt b/Documentation/filesystems/hfs.txt
deleted file mode 100644
index d096df6db07a..000000000000
--- a/Documentation/filesystems/hfs.txt
+++ /dev/null
@@ -1,82 +0,0 @@
-Note: This filesystem doesn't have a maintainer.
-
-Macintosh HFS Filesystem for Linux
-==================================
-
-HFS stands for ``Hierarchical File System'' and is the filesystem used
-by the Mac Plus and all later Macintosh models.  Earlier Macintosh
-models used MFS (``Macintosh File System''), which is not supported,
-MacOS 8.1 and newer support a filesystem called HFS+ that's similar to
-HFS but is extended in various areas.  Use the hfsplus filesystem driver
-to access such filesystems from Linux.
-
-
-Mount options
-=============
-
-When mounting an HFS filesystem, the following options are accepted:
-
-  creator=cccc, type=cccc
-	Specifies the creator/type values as shown by the MacOS finder
-	used for creating new files.  Default values: '????'.
-
-  uid=n, gid=n
-  	Specifies the user/group that owns all files on the filesystems.
-	Default:  user/group id of the mounting process.
-
-  dir_umask=n, file_umask=n, umask=n
-	Specifies the umask used for all files , all directories or all
-	files and directories.  Defaults to the umask of the mounting process.
-
-  session=n
-  	Select the CDROM session to mount as HFS filesystem.  Defaults to
-	leaving that decision to the CDROM driver.  This option will fail
-	with anything but a CDROM as underlying devices.
-
-  part=n
-  	Select partition number n from the devices.  Does only makes
-	sense for CDROMS because they can't be partitioned under Linux.
-	For disk devices the generic partition parsing code does this
-	for us.  Defaults to not parsing the partition table at all.
-
-  quiet
-  	Ignore invalid mount options instead of complaining.
-
-
-Writing to HFS Filesystems
-==========================
-
-HFS is not a UNIX filesystem, thus it does not have the usual features you'd
-expect:
-
- o You can't modify the set-uid, set-gid, sticky or executable bits or the uid
-   and gid of files.
- o You can't create hard- or symlinks, device files, sockets or FIFOs.
-
-HFS does on the other have the concepts of multiple forks per file.  These
-non-standard forks are represented as hidden additional files in the normal
-filesystems namespace which is kind of a cludge and makes the semantics for
-the a little strange:
-
- o You can't create, delete or rename resource forks of files or the
-   Finder's metadata.
- o They are however created (with default values), deleted and renamed
-   along with the corresponding data fork or directory.
- o Copying files to a different filesystem will loose those attributes
-   that are essential for MacOS to work.
-
-
-Creating HFS filesystems
-===================================
-
-The hfsutils package from Robert Leslie contains a program called
-hformat that can be used to create HFS filesystem. See
-<http://www.mars.org/home/rob/proj/hfs/> for details.
-
-
-Credits
-=======
-
-The HFS drivers was written by Paul H. Hargrovea (hargrove@sccm.Stanford.EDU).
-Roman Zippel (roman@ardistech.com) rewrote large parts of the code and brought
-in btree routines derived from Brad Boyer's hfsplus driver.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index c351bc8a8c85..f776411340cb 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -67,6 +67,7 @@ Documentation for filesystem implementations.
    f2fs
    gfs2
    gfs2-uevents
+   hfs
    hfsplus
    fuse
    overlayfs
-- 
cgit 


From a1ef4bcd1664a9c1ae5191598b769ab37b93aa57 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:09 +0100
Subject: docs: filesystems: convert hpfs.txt to ReST

- Add a SPDX header;
- Adjust document and section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/581019c3120938118aa55ba28902b62083c3f37a.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/hpfs.rst  | 353 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/hpfs.txt  | 296 ------------------------------
 Documentation/filesystems/index.rst |   1 +
 3 files changed, 354 insertions(+), 296 deletions(-)
 create mode 100644 Documentation/filesystems/hpfs.rst
 delete mode 100644 Documentation/filesystems/hpfs.txt

diff --git a/Documentation/filesystems/hpfs.rst b/Documentation/filesystems/hpfs.rst
new file mode 100644
index 000000000000..0db152278572
--- /dev/null
+++ b/Documentation/filesystems/hpfs.rst
@@ -0,0 +1,353 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Read/Write HPFS 2.09
+====================
+
+1998-2004, Mikulas Patocka
+
+:email: mikulas@artax.karlin.mff.cuni.cz
+:homepage: http://artax.karlin.mff.cuni.cz/~mikulas/vyplody/hpfs/index-e.cgi
+
+Credits
+=======
+Chris Smith, 1993, original read-only HPFS, some code and hpfs structures file
+	is taken from it
+
+Jacques Gelinas, MSDos mmap, Inspired by fs/nfs/mmap.c (Jon Tombs 15 Aug 1993)
+
+Werner Almesberger, 1992, 1993, MSDos option parser & CR/LF conversion
+
+Mount options
+
+uid=xxx,gid=xxx,umask=xxx (default uid=gid=0 umask=default_system_umask)
+	Set owner/group/mode for files that do not have it specified in extended
+	attributes. Mode is inverted umask - for example umask 027 gives owner
+	all permission, group read permission and anybody else no access. Note
+	that for files mode is anded with 0666. If you want files to have 'x'
+	rights, you must use extended attributes.
+case=lower,asis (default asis)
+	File name lowercasing in readdir.
+conv=binary,text,auto (default binary)
+	CR/LF -> LF conversion, if auto, decision is made according to extension
+	- there is a list of text extensions (I thing it's better to not convert
+	text file than to damage binary file). If you want to change that list,
+	change it in the source. Original readonly HPFS contained some strange
+	heuristic algorithm that I removed. I thing it's danger to let the
+	computer decide whether file is text or binary. For example, DJGPP
+	binaries contain small text message at the beginning and they could be
+	misidentified and damaged under some circumstances.
+check=none,normal,strict (default normal)
+	Check level. Selecting none will cause only little speedup and big
+	danger. I tried to write it so that it won't crash if check=normal on
+	corrupted filesystems. check=strict means many superfluous checks -
+	used for debugging (for example it checks if file is allocated in
+	bitmaps when accessing it).
+errors=continue,remount-ro,panic (default remount-ro)
+	Behaviour when filesystem errors found.
+chkdsk=no,errors,always (default errors)
+	When to mark filesystem dirty so that OS/2 checks it.
+eas=no,ro,rw (default rw)
+	What to do with extended attributes. 'no' - ignore them and use always
+	values specified in uid/gid/mode options. 'ro' - read extended
+	attributes but do not create them. 'rw' - create extended attributes
+	when you use chmod/chown/chgrp/mknod/ln -s on the filesystem.
+timeshift=(-)nnn (default 0)
+	Shifts the time by nnn seconds. For example, if you see under linux
+	one hour more, than under os/2, use timeshift=-3600.
+
+
+File names
+==========
+
+As in OS/2, filenames are case insensitive. However, shell thinks that names
+are case sensitive, so for example when you create a file FOO, you can use
+'cat FOO', 'cat Foo', 'cat foo' or 'cat F*' but not 'cat f*'. Note, that you
+also won't be able to compile linux kernel (and maybe other things) on HPFS
+because kernel creates different files with names like bootsect.S and
+bootsect.s. When searching for file thats name has characters >= 128, codepages
+are used - see below.
+OS/2 ignores dots and spaces at the end of file name, so this driver does as
+well. If you create 'a. ...', the file 'a' will be created, but you can still
+access it under names 'a.', 'a..', 'a .  . . ' etc.
+
+
+Extended attributes
+===================
+
+On HPFS partitions, OS/2 can associate to each file a special information called
+extended attributes. Extended attributes are pairs of (key,value) where key is
+an ascii string identifying that attribute and value is any string of bytes of
+variable length. OS/2 stores window and icon positions and file types there. So
+why not use it for unix-specific info like file owner or access rights? This
+driver can do it. If you chown/chgrp/chmod on a hpfs partition, extended
+attributes with keys "UID", "GID" or "MODE" and 2-byte values are created. Only
+that extended attributes those value differs from defaults specified in mount
+options are created. Once created, the extended attributes are never deleted,
+they're just changed. It means that when your default uid=0 and you type
+something like 'chown luser file; chown root file' the file will contain
+extended attribute UID=0. And when you umount the fs and mount it again with
+uid=luser_uid, the file will be still owned by root! If you chmod file to 444,
+extended attribute "MODE" will not be set, this special case is done by setting
+read-only flag. When you mknod a block or char device, besides "MODE", the
+special 4-byte extended attribute "DEV" will be created containing the device
+number. Currently this driver cannot resize extended attributes - it means
+that if somebody (I don't know who?) has set "UID", "GID", "MODE" or "DEV"
+attributes with different sizes, they won't be rewritten and changing these
+values doesn't work.
+
+
+Symlinks
+========
+
+You can do symlinks on HPFS partition, symlinks are achieved by setting extended
+attribute named "SYMLINK" with symlink value. Like on ext2, you can chown and
+chgrp symlinks but I don't know what is it good for. chmoding symlink results
+in chmoding file where symlink points. These symlinks are just for Linux use and
+incompatible with OS/2. OS/2 PmShell symlinks are not supported because they are
+stored in very crazy way. They tried to do it so that link changes when file is
+moved ... sometimes it works. But the link is partly stored in directory
+extended attributes and partly in OS2SYS.INI. I don't want (and don't know how)
+to analyze or change OS2SYS.INI.
+
+
+Codepages
+=========
+
+HPFS can contain several uppercasing tables for several codepages and each
+file has a pointer to codepage its name is in. However OS/2 was created in
+America where people don't care much about codepages and so multiple codepages
+support is quite buggy. I have Czech OS/2 working in codepage 852 on my disk.
+Once I booted English OS/2 working in cp 850 and I created a file on my 852
+partition. It marked file name codepage as 850 - good. But when I again booted
+Czech OS/2, the file was completely inaccessible under any name. It seems that
+OS/2 uppercases the search pattern with its system code page (852) and file
+name it's comparing to with its code page (850). These could never match. Is it
+really what IBM developers wanted? But problems continued. When I created in
+Czech OS/2 another file in that directory, that file was inaccessible too. OS/2
+probably uses different uppercasing method when searching where to place a file
+(note, that files in HPFS directory must be sorted) and when searching for
+a file. Finally when I opened this directory in PmShell, PmShell crashed (the
+funny thing was that, when rebooted, PmShell tried to reopen this directory
+again :-). chkdsk happily ignores these errors and only low-level disk
+modification saved me.  Never mix different language versions of OS/2 on one
+system although HPFS was designed to allow that.
+OK, I could implement complex codepage support to this driver but I think it
+would cause more problems than benefit with such buggy implementation in OS/2.
+So this driver simply uses first codepage it finds for uppercasing and
+lowercasing no matter what's file codepage index. Usually all file names are in
+this codepage - if you don't try to do what I described above :-)
+
+
+Known bugs
+==========
+
+HPFS386 on OS/2 server is not supported. HPFS386 installed on normal OS/2 client
+should work. If you have OS/2 server, use only read-only mode. I don't know how
+to handle some HPFS386 structures like access control list or extended perm
+list, I don't know how to delete them when file is deleted and how to not
+overwrite them with extended attributes. Send me some info on these structures
+and I'll make it. However, this driver should detect presence of HPFS386
+structures, remount read-only and not destroy them (I hope).
+
+When there's not enough space for extended attributes, they will be truncated
+and no error is returned.
+
+OS/2 can't access files if the path is longer than about 256 chars but this
+driver allows you to do it. chkdsk ignores such errors.
+
+Sometimes you won't be able to delete some files on a very full filesystem
+(returning error ENOSPC). That's because file in non-leaf node in directory tree
+(one directory, if it's large, has dirents in tree on HPFS) must be replaced
+with another node when deleted. And that new file might have larger name than
+the old one so the new name doesn't fit in directory node (dnode). And that
+would result in directory tree splitting, that takes disk space. Workaround is
+to delete other files that are leaf (probability that the file is non-leaf is
+about 1/50) or to truncate file first to make some space.
+You encounter this problem only if you have many directories so that
+preallocated directory band is full i.e.::
+
+	number_of_directories / size_of_filesystem_in_mb > 4.
+
+You can't delete open directories.
+
+You can't rename over directories (what is it good for?).
+
+Renaming files so that only case changes doesn't work. This driver supports it
+but vfs doesn't. Something like 'mv file FILE' won't work.
+
+All atimes and directory mtimes are not updated. That's because of performance
+reasons. If you extremely wish to update them, let me know, I'll write it (but
+it will be slow).
+
+When the system is out of memory and swap, it may slightly corrupt filesystem
+(lost files, unbalanced directories). (I guess all filesystem may do it).
+
+When compiled, you get warning: function declaration isn't a prototype. Does
+anybody know what does it mean?
+
+
+What does "unbalanced tree" message mean?
+=========================================
+
+Old versions of this driver created sometimes unbalanced dnode trees. OS/2
+chkdsk doesn't scream if the tree is unbalanced (and sometimes creates
+unbalanced trees too :-) but both HPFS and HPFS386 contain bug that it rarely
+crashes when the tree is not balanced. This driver handles unbalanced trees
+correctly and writes warning if it finds them. If you see this message, this is
+probably because of directories created with old version of this driver.
+Workaround is to move all files from that directory to another and then back
+again. Do it in Linux, not OS/2! If you see this message in directory that is
+whole created by this driver, it is BUG - let me know about it.
+
+
+Bugs in OS/2
+============
+
+When you have two (or more) lost directories pointing each to other, chkdsk
+locks up when repairing filesystem.
+
+Sometimes (I think it's random) when you create a file with one-char name under
+OS/2, OS/2 marks it as 'long'. chkdsk then removes this flag saying "Minor fs
+error corrected".
+
+File names like "a .b" are marked as 'long' by OS/2 but chkdsk "corrects" it and
+marks them as short (and writes "minor fs error corrected"). This bug is not in
+HPFS386.
+
+Codepage bugs described above
+=============================
+
+If you don't install fixpacks, there are many, many more...
+
+
+History
+=======
+
+====== =========================================================================
+0.90   First public release
+0.91   Fixed bug that caused shooting to memory when write_inode was called on
+       open inode (rarely happened)
+0.92   Fixed a little memory leak in freeing directory inodes
+0.93   Fixed bug that locked up the machine when there were too many filenames
+       with first 15 characters same
+       Fixed write_file to zero file when writing behind file end
+0.94   Fixed a little memory leak when trying to delete busy file or directory
+0.95   Fixed a bug that i_hpfs_parent_dir was not updated when moving files
+1.90   First version for 2.1.1xx kernels
+1.91   Fixed a bug that chk_sectors failed when sectors were at the end of disk
+       Fixed a race-condition when write_inode is called while deleting file
+       Fixed a bug that could possibly happen (with very low probability) when
+       using 0xff in filenames.
+
+       Rewritten locking to avoid race-conditions
+
+       Mount option 'eas' now works
+
+       Fsync no longer returns error
+
+       Files beginning with '.' are marked hidden
+
+       Remount support added
+
+       Alloc is not so slow when filesystem becomes full
+
+       Atimes are no more updated because it slows down operation
+
+       Code cleanup (removed all commented debug prints)
+1.92   Corrected a bug when sync was called just before closing file
+1.93   Modified, so that it works with kernels >= 2.1.131, I don't know if it
+       works with previous versions
+
+       Fixed a possible problem with disks > 64G (but I don't have one, so I can't
+       test it)
+
+       Fixed a file overflow at 2G
+
+       Added new option 'timeshift'
+
+       Changed behaviour on HPFS386: It is now possible to operate on HPFS386 in
+       read-only mode
+
+       Fixed a bug that slowed down alloc and prevented allocating 100% space
+       (this bug was not destructive)
+1.94   Added workaround for one bug in Linux
+
+       Fixed one buffer leak
+
+       Fixed some incompatibilities with large extended attributes (but it's still
+       not 100% ok, I have no info on it and OS/2 doesn't want to create them)
+
+       Rewritten allocation
+
+       Fixed a bug with i_blocks (du sometimes didn't display correct values)
+
+       Directories have no longer archive attribute set (some programs don't like
+       it)
+
+       Fixed a bug that it set badly one flag in large anode tree (it was not
+       destructive)
+1.95   Fixed one buffer leak, that could happen on corrupted filesystem
+
+       Fixed one bug in allocation in 1.94
+1.96   Added workaround for one bug in OS/2 (HPFS locked up, HPFS386 reported
+       error sometimes when opening directories in PMSHELL)
+
+       Fixed a possible bitmap race
+
+       Fixed possible problem on large disks
+
+       You can now delete open files
+
+       Fixed a nondestructive race in rename
+1.97   Support for HPFS v3 (on large partitions)
+
+       ZFixed a bug that it didn't allow creation of files > 128M
+       (it should be 2G)
+1.97.1 Changed names of global symbols
+
+       Fixed a bug when chmoding or chowning root directory
+1.98   Fixed a deadlock when using old_readdir
+       Better directory handling; workaround for "unbalanced tree" bug in OS/2
+1.99   Corrected a possible problem when there's not enough space while deleting
+       file
+
+       Now it tries to truncate the file if there's not enough space when
+       deleting
+
+       Removed a lot of redundant code
+2.00   Fixed a bug in rename (it was there since 1.96)
+       Better anti-fragmentation strategy
+2.01   Fixed problem with directory listing over NFS
+
+       Directory lseek now checks for proper parameters
+
+       Fixed race-condition in buffer code - it is in all filesystems in Linux;
+       when reading device (cat /dev/hda) while creating files on it, files
+       could be damaged
+2.02   Workaround for bug in breada in Linux. breada could cause accesses beyond
+       end of partition
+2.03   Char, block devices and pipes are correctly created
+
+       Fixed non-crashing race in unlink (Alexander Viro)
+
+       Now it works with Japanese version of OS/2
+2.04   Fixed error when ftruncate used to extend file
+2.05   Fixed crash when got mount parameters without =
+
+       Fixed crash when allocation of anode failed due to full disk
+
+       Fixed some crashes when block io or inode allocation failed
+2.06   Fixed some crash on corrupted disk structures
+
+       Better allocation strategy
+
+       Reschedule points added so that it doesn't lock CPU long time
+
+       It should work in read-only mode on Warp Server
+2.07   More fixes for Warp Server. Now it really works
+2.08   Creating new files is not so slow on large disks
+
+       An attempt to sync deleted file does not generate filesystem error
+2.09   Fixed error on extremely fragmented files
+====== =========================================================================
diff --git a/Documentation/filesystems/hpfs.txt b/Documentation/filesystems/hpfs.txt
deleted file mode 100644
index 74630bd504fb..000000000000
--- a/Documentation/filesystems/hpfs.txt
+++ /dev/null
@@ -1,296 +0,0 @@
-Read/Write HPFS 2.09
-1998-2004, Mikulas Patocka
-
-email: mikulas@artax.karlin.mff.cuni.cz
-homepage: http://artax.karlin.mff.cuni.cz/~mikulas/vyplody/hpfs/index-e.cgi
-
-CREDITS:
-Chris Smith, 1993, original read-only HPFS, some code and hpfs structures file
-	is taken from it
-Jacques Gelinas, MSDos mmap, Inspired by fs/nfs/mmap.c (Jon Tombs 15 Aug 1993)
-Werner Almesberger, 1992, 1993, MSDos option parser & CR/LF conversion
-
-Mount options
-
-uid=xxx,gid=xxx,umask=xxx (default uid=gid=0 umask=default_system_umask)
-	Set owner/group/mode for files that do not have it specified in extended
-	attributes. Mode is inverted umask - for example umask 027 gives owner
-	all permission, group read permission and anybody else no access. Note
-	that for files mode is anded with 0666. If you want files to have 'x'
-	rights, you must use extended attributes.
-case=lower,asis (default asis)
-	File name lowercasing in readdir.
-conv=binary,text,auto (default binary)
-	CR/LF -> LF conversion, if auto, decision is made according to extension
-	- there is a list of text extensions (I thing it's better to not convert
-	text file than to damage binary file). If you want to change that list,
-	change it in the source. Original readonly HPFS contained some strange
-	heuristic algorithm that I removed. I thing it's danger to let the
-	computer decide whether file is text or binary. For example, DJGPP
-	binaries contain small text message at the beginning and they could be
-	misidentified and damaged under some circumstances.
-check=none,normal,strict (default normal)
-	Check level. Selecting none will cause only little speedup and big
-	danger. I tried to write it so that it won't crash if check=normal on
-	corrupted filesystems. check=strict means many superfluous checks -
-	used for debugging (for example it checks if file is allocated in
-	bitmaps when accessing it).
-errors=continue,remount-ro,panic (default remount-ro)
-	Behaviour when filesystem errors found.
-chkdsk=no,errors,always (default errors)
-	When to mark filesystem dirty so that OS/2 checks it.
-eas=no,ro,rw (default rw)
-	What to do with extended attributes. 'no' - ignore them and use always
-	values specified in uid/gid/mode options. 'ro' - read extended
-	attributes but do not create them. 'rw' - create extended attributes
-	when you use chmod/chown/chgrp/mknod/ln -s on the filesystem.
-timeshift=(-)nnn (default 0)
-	Shifts the time by nnn seconds. For example, if you see under linux
-	one hour more, than under os/2, use timeshift=-3600.
-
-
-File names
-
-As in OS/2, filenames are case insensitive. However, shell thinks that names
-are case sensitive, so for example when you create a file FOO, you can use
-'cat FOO', 'cat Foo', 'cat foo' or 'cat F*' but not 'cat f*'. Note, that you
-also won't be able to compile linux kernel (and maybe other things) on HPFS
-because kernel creates different files with names like bootsect.S and
-bootsect.s. When searching for file thats name has characters >= 128, codepages
-are used - see below.
-OS/2 ignores dots and spaces at the end of file name, so this driver does as
-well. If you create 'a. ...', the file 'a' will be created, but you can still
-access it under names 'a.', 'a..', 'a .  . . ' etc.
-
-
-Extended attributes
-
-On HPFS partitions, OS/2 can associate to each file a special information called
-extended attributes. Extended attributes are pairs of (key,value) where key is
-an ascii string identifying that attribute and value is any string of bytes of
-variable length. OS/2 stores window and icon positions and file types there. So
-why not use it for unix-specific info like file owner or access rights? This
-driver can do it. If you chown/chgrp/chmod on a hpfs partition, extended
-attributes with keys "UID", "GID" or "MODE" and 2-byte values are created. Only
-that extended attributes those value differs from defaults specified in mount
-options are created. Once created, the extended attributes are never deleted,
-they're just changed. It means that when your default uid=0 and you type
-something like 'chown luser file; chown root file' the file will contain
-extended attribute UID=0. And when you umount the fs and mount it again with
-uid=luser_uid, the file will be still owned by root! If you chmod file to 444,
-extended attribute "MODE" will not be set, this special case is done by setting
-read-only flag. When you mknod a block or char device, besides "MODE", the
-special 4-byte extended attribute "DEV" will be created containing the device
-number. Currently this driver cannot resize extended attributes - it means
-that if somebody (I don't know who?) has set "UID", "GID", "MODE" or "DEV"
-attributes with different sizes, they won't be rewritten and changing these
-values doesn't work.
-
-
-Symlinks
-
-You can do symlinks on HPFS partition, symlinks are achieved by setting extended
-attribute named "SYMLINK" with symlink value. Like on ext2, you can chown and
-chgrp symlinks but I don't know what is it good for. chmoding symlink results
-in chmoding file where symlink points. These symlinks are just for Linux use and
-incompatible with OS/2. OS/2 PmShell symlinks are not supported because they are
-stored in very crazy way. They tried to do it so that link changes when file is
-moved ... sometimes it works. But the link is partly stored in directory
-extended attributes and partly in OS2SYS.INI. I don't want (and don't know how)
-to analyze or change OS2SYS.INI.
-
-
-Codepages
-
-HPFS can contain several uppercasing tables for several codepages and each
-file has a pointer to codepage its name is in. However OS/2 was created in
-America where people don't care much about codepages and so multiple codepages
-support is quite buggy. I have Czech OS/2 working in codepage 852 on my disk.
-Once I booted English OS/2 working in cp 850 and I created a file on my 852
-partition. It marked file name codepage as 850 - good. But when I again booted
-Czech OS/2, the file was completely inaccessible under any name. It seems that
-OS/2 uppercases the search pattern with its system code page (852) and file
-name it's comparing to with its code page (850). These could never match. Is it
-really what IBM developers wanted? But problems continued. When I created in
-Czech OS/2 another file in that directory, that file was inaccessible too. OS/2
-probably uses different uppercasing method when searching where to place a file
-(note, that files in HPFS directory must be sorted) and when searching for
-a file. Finally when I opened this directory in PmShell, PmShell crashed (the
-funny thing was that, when rebooted, PmShell tried to reopen this directory
-again :-). chkdsk happily ignores these errors and only low-level disk
-modification saved me.  Never mix different language versions of OS/2 on one
-system although HPFS was designed to allow that.
-OK, I could implement complex codepage support to this driver but I think it
-would cause more problems than benefit with such buggy implementation in OS/2.
-So this driver simply uses first codepage it finds for uppercasing and
-lowercasing no matter what's file codepage index. Usually all file names are in
-this codepage - if you don't try to do what I described above :-)
-
-
-Known bugs
-
-HPFS386 on OS/2 server is not supported. HPFS386 installed on normal OS/2 client
-should work. If you have OS/2 server, use only read-only mode. I don't know how
-to handle some HPFS386 structures like access control list or extended perm
-list, I don't know how to delete them when file is deleted and how to not
-overwrite them with extended attributes. Send me some info on these structures
-and I'll make it. However, this driver should detect presence of HPFS386
-structures, remount read-only and not destroy them (I hope).
-
-When there's not enough space for extended attributes, they will be truncated
-and no error is returned.
-
-OS/2 can't access files if the path is longer than about 256 chars but this
-driver allows you to do it. chkdsk ignores such errors.
-
-Sometimes you won't be able to delete some files on a very full filesystem
-(returning error ENOSPC). That's because file in non-leaf node in directory tree
-(one directory, if it's large, has dirents in tree on HPFS) must be replaced
-with another node when deleted. And that new file might have larger name than
-the old one so the new name doesn't fit in directory node (dnode). And that
-would result in directory tree splitting, that takes disk space. Workaround is
-to delete other files that are leaf (probability that the file is non-leaf is
-about 1/50) or to truncate file first to make some space.
-You encounter this problem only if you have many directories so that
-preallocated directory band is full i.e.
-	number_of_directories / size_of_filesystem_in_mb > 4.
-
-You can't delete open directories.
-
-You can't rename over directories (what is it good for?).
-
-Renaming files so that only case changes doesn't work. This driver supports it
-but vfs doesn't. Something like 'mv file FILE' won't work.
-
-All atimes and directory mtimes are not updated. That's because of performance
-reasons. If you extremely wish to update them, let me know, I'll write it (but
-it will be slow).
-
-When the system is out of memory and swap, it may slightly corrupt filesystem
-(lost files, unbalanced directories). (I guess all filesystem may do it).
-
-When compiled, you get warning: function declaration isn't a prototype. Does
-anybody know what does it mean?
-
-
-What does "unbalanced tree" message mean?
-
-Old versions of this driver created sometimes unbalanced dnode trees. OS/2
-chkdsk doesn't scream if the tree is unbalanced (and sometimes creates
-unbalanced trees too :-) but both HPFS and HPFS386 contain bug that it rarely
-crashes when the tree is not balanced. This driver handles unbalanced trees
-correctly and writes warning if it finds them. If you see this message, this is
-probably because of directories created with old version of this driver.
-Workaround is to move all files from that directory to another and then back
-again. Do it in Linux, not OS/2! If you see this message in directory that is
-whole created by this driver, it is BUG - let me know about it.
-
-
-Bugs in OS/2
-
-When you have two (or more) lost directories pointing each to other, chkdsk
-locks up when repairing filesystem.
-
-Sometimes (I think it's random) when you create a file with one-char name under
-OS/2, OS/2 marks it as 'long'. chkdsk then removes this flag saying "Minor fs
-error corrected".
-
-File names like "a .b" are marked as 'long' by OS/2 but chkdsk "corrects" it and
-marks them as short (and writes "minor fs error corrected"). This bug is not in
-HPFS386.
-
-Codepage bugs described above.
-
-If you don't install fixpacks, there are many, many more...
-
-
-History
-
-0.90 First public release
-0.91 Fixed bug that caused shooting to memory when write_inode was called on
-	open inode (rarely happened)
-0.92 Fixed a little memory leak in freeing directory inodes
-0.93 Fixed bug that locked up the machine when there were too many filenames
-	with first 15 characters same
-     Fixed write_file to zero file when writing behind file end
-0.94 Fixed a little memory leak when trying to delete busy file or directory
-0.95 Fixed a bug that i_hpfs_parent_dir was not updated when moving files
-1.90 First version for 2.1.1xx kernels
-1.91 Fixed a bug that chk_sectors failed when sectors were at the end of disk
-     Fixed a race-condition when write_inode is called while deleting file
-     Fixed a bug that could possibly happen (with very low probability) when
-     	using 0xff in filenames
-     Rewritten locking to avoid race-conditions
-     Mount option 'eas' now works
-     Fsync no longer returns error
-     Files beginning with '.' are marked hidden
-     Remount support added
-     Alloc is not so slow when filesystem becomes full
-     Atimes are no more updated because it slows down operation
-     Code cleanup (removed all commented debug prints)
-1.92 Corrected a bug when sync was called just before closing file
-1.93 Modified, so that it works with kernels >= 2.1.131, I don't know if it
-	works with previous versions
-     Fixed a possible problem with disks > 64G (but I don't have one, so I can't
-     	test it)
-     Fixed a file overflow at 2G
-     Added new option 'timeshift'
-     Changed behaviour on HPFS386: It is now possible to operate on HPFS386 in
-     	read-only mode
-     Fixed a bug that slowed down alloc and prevented allocating 100% space
-     	(this bug was not destructive)
-1.94 Added workaround for one bug in Linux
-     Fixed one buffer leak
-     Fixed some incompatibilities with large extended attributes (but it's still
-	not 100% ok, I have no info on it and OS/2 doesn't want to create them)
-     Rewritten allocation
-     Fixed a bug with i_blocks (du sometimes didn't display correct values)
-     Directories have no longer archive attribute set (some programs don't like
-	it)
-     Fixed a bug that it set badly one flag in large anode tree (it was not
-	destructive)
-1.95 Fixed one buffer leak, that could happen on corrupted filesystem
-     Fixed one bug in allocation in 1.94
-1.96 Added workaround for one bug in OS/2 (HPFS locked up, HPFS386 reported
-	error sometimes when opening directories in PMSHELL)
-     Fixed a possible bitmap race
-     Fixed possible problem on large disks
-     You can now delete open files
-     Fixed a nondestructive race in rename
-1.97 Support for HPFS v3 (on large partitions)
-     Fixed a bug that it didn't allow creation of files > 128M (it should be 2G)
-1.97.1 Changed names of global symbols
-       Fixed a bug when chmoding or chowning root directory
-1.98 Fixed a deadlock when using old_readdir
-     Better directory handling; workaround for "unbalanced tree" bug in OS/2
-1.99 Corrected a possible problem when there's not enough space while deleting
-	file
-     Now it tries to truncate the file if there's not enough space when deleting
-     Removed a lot of redundant code
-2.00 Fixed a bug in rename (it was there since 1.96)
-     Better anti-fragmentation strategy
-2.01 Fixed problem with directory listing over NFS
-     Directory lseek now checks for proper parameters
-     Fixed race-condition in buffer code - it is in all filesystems in Linux;
-        when reading device (cat /dev/hda) while creating files on it, files
-        could be damaged
-2.02 Workaround for bug in breada in Linux. breada could cause accesses beyond
-        end of partition
-2.03 Char, block devices and pipes are correctly created
-     Fixed non-crashing race in unlink (Alexander Viro)
-     Now it works with Japanese version of OS/2
-2.04 Fixed error when ftruncate used to extend file
-2.05 Fixed crash when got mount parameters without =
-     Fixed crash when allocation of anode failed due to full disk
-     Fixed some crashes when block io or inode allocation failed
-2.06 Fixed some crash on corrupted disk structures
-     Better allocation strategy
-     Reschedule points added so that it doesn't lock CPU long time
-     It should work in read-only mode on Warp Server
-2.07 More fixes for Warp Server. Now it really works
-2.08 Creating new files is not so slow on large disks
-     An attempt to sync deleted file does not generate filesystem error
-2.09 Fixed error on extremely fragmented files
-
-
- vim: set textwidth=80:
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index f776411340cb..3fbe2fa0b5c5 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -69,6 +69,7 @@ Documentation for filesystem implementations.
    gfs2-uevents
    hfs
    hfsplus
+   hpfs
    fuse
    overlayfs
    virtiofs
-- 
cgit 


From de389cf08d4708d0a03516e5ce0e193f49f0b358 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:10 +0100
Subject: docs: filesystems: convert inotify.txt to ReST

- Add a SPDX header;
- Add a document title;
- Adjust document title;
- Fix list markups;
- Some whitespace fixes and new line breaks;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/8f846843ecf1914988feb4d001e3a53d27dc1a65.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst   |  1 +
 Documentation/filesystems/inotify.rst | 90 +++++++++++++++++++++++++++++++++++
 Documentation/filesystems/inotify.txt | 79 ------------------------------
 3 files changed, 91 insertions(+), 79 deletions(-)
 create mode 100644 Documentation/filesystems/inotify.rst
 delete mode 100644 Documentation/filesystems/inotify.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 3fbe2fa0b5c5..5a737722652c 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -70,6 +70,7 @@ Documentation for filesystem implementations.
    hfs
    hfsplus
    hpfs
+   inotify
    fuse
    overlayfs
    virtiofs
diff --git a/Documentation/filesystems/inotify.rst b/Documentation/filesystems/inotify.rst
new file mode 100644
index 000000000000..7f7ef8af0e1e
--- /dev/null
+++ b/Documentation/filesystems/inotify.rst
@@ -0,0 +1,90 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============================================================
+Inotify - A Powerful yet Simple File Change Notification System
+===============================================================
+
+
+
+Document started 15 Mar 2005 by Robert Love <rml@novell.com>
+
+Document updated 4 Jan 2015 by Zhang Zhen <zhenzhang.zhang@huawei.com>
+
+	- Deleted obsoleted interface, just refer to manpages for user interface.
+
+(i) Rationale
+
+Q:
+   What is the design decision behind not tying the watch to the open fd of
+   the watched object?
+
+A:
+   Watches are associated with an open inotify device, not an open file.
+   This solves the primary problem with dnotify: keeping the file open pins
+   the file and thus, worse, pins the mount.  Dnotify is therefore infeasible
+   for use on a desktop system with removable media as the media cannot be
+   unmounted.  Watching a file should not require that it be open.
+
+Q:
+   What is the design decision behind using an-fd-per-instance as opposed to
+   an fd-per-watch?
+
+A:
+   An fd-per-watch quickly consumes more file descriptors than are allowed,
+   more fd's than are feasible to manage, and more fd's than are optimally
+   select()-able.  Yes, root can bump the per-process fd limit and yes, users
+   can use epoll, but requiring both is a silly and extraneous requirement.
+   A watch consumes less memory than an open file, separating the number
+   spaces is thus sensible.  The current design is what user-space developers
+   want: Users initialize inotify, once, and add n watches, requiring but one
+   fd and no twiddling with fd limits.  Initializing an inotify instance two
+   thousand times is silly.  If we can implement user-space's preferences
+   cleanly--and we can, the idr layer makes stuff like this trivial--then we
+   should.
+
+   There are other good arguments.  With a single fd, there is a single
+   item to block on, which is mapped to a single queue of events.  The single
+   fd returns all watch events and also any potential out-of-band data.  If
+   every fd was a separate watch,
+
+   - There would be no way to get event ordering.  Events on file foo and
+     file bar would pop poll() on both fd's, but there would be no way to tell
+     which happened first.  A single queue trivially gives you ordering.  Such
+     ordering is crucial to existing applications such as Beagle.  Imagine
+     "mv a b ; mv b a" events without ordering.
+
+   - We'd have to maintain n fd's and n internal queues with state,
+     versus just one.  It is a lot messier in the kernel.  A single, linear
+     queue is the data structure that makes sense.
+
+   - User-space developers prefer the current API.  The Beagle guys, for
+     example, love it.  Trust me, I asked.  It is not a surprise: Who'd want
+     to manage and block on 1000 fd's via select?
+
+   - No way to get out of band data.
+
+   - 1024 is still too low.  ;-)
+
+   When you talk about designing a file change notification system that
+   scales to 1000s of directories, juggling 1000s of fd's just does not seem
+   the right interface.  It is too heavy.
+
+   Additionally, it _is_ possible to  more than one instance  and
+   juggle more than one queue and thus more than one associated fd.  There
+   need not be a one-fd-per-process mapping; it is one-fd-per-queue and a
+   process can easily want more than one queue.
+
+Q:
+   Why the system call approach?
+
+A:
+   The poor user-space interface is the second biggest problem with dnotify.
+   Signals are a terrible, terrible interface for file notification.  Or for
+   anything, for that matter.  The ideal solution, from all perspectives, is a
+   file descriptor-based one that allows basic file I/O and poll/select.
+   Obtaining the fd and managing the watches could have been done either via a
+   device file or a family of new system calls.  We decided to implement a
+   family of system calls because that is the preferred approach for new kernel
+   interfaces.  The only real difference was whether we wanted to use open(2)
+   and ioctl(2) or a couple of new system calls.  System calls beat ioctls.
+
diff --git a/Documentation/filesystems/inotify.txt b/Documentation/filesystems/inotify.txt
deleted file mode 100644
index 51f61db787fb..000000000000
--- a/Documentation/filesystems/inotify.txt
+++ /dev/null
@@ -1,79 +0,0 @@
-				   inotify
-	    a powerful yet simple file change notification system
-
-
-
-Document started 15 Mar 2005 by Robert Love <rml@novell.com>
-Document updated 4 Jan 2015 by Zhang Zhen <zhenzhang.zhang@huawei.com>
-	--Deleted obsoleted interface, just refer to manpages for user interface.
-
-(i) Rationale
-
-Q: What is the design decision behind not tying the watch to the open fd of
-   the watched object?
-
-A: Watches are associated with an open inotify device, not an open file.
-   This solves the primary problem with dnotify: keeping the file open pins
-   the file and thus, worse, pins the mount.  Dnotify is therefore infeasible
-   for use on a desktop system with removable media as the media cannot be
-   unmounted.  Watching a file should not require that it be open.
-
-Q: What is the design decision behind using an-fd-per-instance as opposed to
-   an fd-per-watch?
-
-A: An fd-per-watch quickly consumes more file descriptors than are allowed,
-   more fd's than are feasible to manage, and more fd's than are optimally
-   select()-able.  Yes, root can bump the per-process fd limit and yes, users
-   can use epoll, but requiring both is a silly and extraneous requirement.
-   A watch consumes less memory than an open file, separating the number
-   spaces is thus sensible.  The current design is what user-space developers
-   want: Users initialize inotify, once, and add n watches, requiring but one
-   fd and no twiddling with fd limits.  Initializing an inotify instance two
-   thousand times is silly.  If we can implement user-space's preferences 
-   cleanly--and we can, the idr layer makes stuff like this trivial--then we 
-   should.
-
-   There are other good arguments.  With a single fd, there is a single
-   item to block on, which is mapped to a single queue of events.  The single
-   fd returns all watch events and also any potential out-of-band data.  If
-   every fd was a separate watch,
-
-   - There would be no way to get event ordering.  Events on file foo and
-     file bar would pop poll() on both fd's, but there would be no way to tell
-     which happened first.  A single queue trivially gives you ordering.  Such
-     ordering is crucial to existing applications such as Beagle.  Imagine
-     "mv a b ; mv b a" events without ordering.
-
-   - We'd have to maintain n fd's and n internal queues with state,
-     versus just one.  It is a lot messier in the kernel.  A single, linear
-     queue is the data structure that makes sense.
-
-   - User-space developers prefer the current API.  The Beagle guys, for
-     example, love it.  Trust me, I asked.  It is not a surprise: Who'd want
-     to manage and block on 1000 fd's via select?
-
-   - No way to get out of band data.
-
-   - 1024 is still too low.  ;-)
-
-   When you talk about designing a file change notification system that
-   scales to 1000s of directories, juggling 1000s of fd's just does not seem
-   the right interface.  It is too heavy.
-
-   Additionally, it _is_ possible to  more than one instance  and
-   juggle more than one queue and thus more than one associated fd.  There
-   need not be a one-fd-per-process mapping; it is one-fd-per-queue and a
-   process can easily want more than one queue.
-
-Q: Why the system call approach?
-
-A: The poor user-space interface is the second biggest problem with dnotify.
-   Signals are a terrible, terrible interface for file notification.  Or for
-   anything, for that matter.  The ideal solution, from all perspectives, is a
-   file descriptor-based one that allows basic file I/O and poll/select.
-   Obtaining the fd and managing the watches could have been done either via a
-   device file or a family of new system calls.  We decided to implement a
-   family of system calls because that is the preferred approach for new kernel
-   interfaces.  The only real difference was whether we wanted to use open(2)
-   and ioctl(2) or a couple of new system calls.  System calls beat ioctls.
-
-- 
cgit 


From 76f216855b6bd1027e236b29cd7fece7336c37eb Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:11 +0100
Subject: docs: filesystems: convert isofs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Add table markups;
- Add lists markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/ec16dc09d0c23bb0c1af3d3f33a96896083a1d36.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |  1 +
 Documentation/filesystems/isofs.rst | 64 +++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/isofs.txt | 48 ----------------------------
 3 files changed, 65 insertions(+), 48 deletions(-)
 create mode 100644 Documentation/filesystems/isofs.rst
 delete mode 100644 Documentation/filesystems/isofs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 5a737722652c..8c8813ada53f 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -71,6 +71,7 @@ Documentation for filesystem implementations.
    hfsplus
    hpfs
    inotify
+   isofs
    fuse
    overlayfs
    virtiofs
diff --git a/Documentation/filesystems/isofs.rst b/Documentation/filesystems/isofs.rst
new file mode 100644
index 000000000000..08fd469091d4
--- /dev/null
+++ b/Documentation/filesystems/isofs.rst
@@ -0,0 +1,64 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================
+ISO9660 Filesystem
+==================
+
+Mount options that are the same as for msdos and vfat partitions.
+
+  =========	========================================================
+  gid=nnn	All files in the partition will be in group nnn.
+  uid=nnn	All files in the partition will be owned by user id nnn.
+  umask=nnn	The permission mask (see umask(1)) for the partition.
+  =========	========================================================
+
+Mount options that are the same as vfat partitions. These are only useful
+when using discs encoded using Microsoft's Joliet extensions.
+
+ ==============	=============================================================
+ iocharset=name Character set to use for converting from Unicode to
+		ASCII.  Joliet filenames are stored in Unicode format, but
+		Unix for the most part doesn't know how to deal with Unicode.
+		There is also an option of doing UTF-8 translations with the
+		utf8 option.
+  utf8          Encode Unicode names in UTF-8 format. Default is no.
+ ==============	=============================================================
+
+Mount options unique to the isofs filesystem.
+
+ ================= ============================================================
+  block=512        Set the block size for the disk to 512 bytes
+  block=1024       Set the block size for the disk to 1024 bytes
+  block=2048       Set the block size for the disk to 2048 bytes
+  check=relaxed    Matches filenames with different cases
+  check=strict     Matches only filenames with the exact same case
+  cruft            Try to handle badly formatted CDs.
+  map=off          Do not map non-Rock Ridge filenames to lower case
+  map=normal       Map non-Rock Ridge filenames to lower case
+  map=acorn        As map=normal but also apply Acorn extensions if present
+  mode=xxx         Sets the permissions on files to xxx unless Rock Ridge
+		   extensions set the permissions otherwise
+  dmode=xxx        Sets the permissions on directories to xxx unless Rock Ridge
+		   extensions set the permissions otherwise
+  overriderockperm Set permissions on files and directories according to
+		   'mode' and 'dmode' even though Rock Ridge extensions are
+		   present.
+  nojoliet         Ignore Joliet extensions if they are present.
+  norock           Ignore Rock Ridge extensions if they are present.
+  hide		   Completely strip hidden files from the file system.
+  showassoc	   Show files marked with the 'associated' bit
+  unhide	   Deprecated; showing hidden files is now default;
+		   If given, it is a synonym for 'showassoc' which will
+		   recreate previous unhide behavior
+  session=x        Select number of session on multisession CD
+  sbsector=xxx     Session begins from sector xxx
+ ================= ============================================================
+
+Recommended documents about ISO 9660 standard are located at:
+
+- http://www.y-adagio.com/
+- ftp://ftp.ecma.ch/ecma-st/Ecma-119.pdf
+
+Quoting from the PDF "This 2nd Edition of Standard ECMA-119 is technically
+identical with ISO 9660.", so it is a valid and gratis substitute of the
+official ISO specification.
diff --git a/Documentation/filesystems/isofs.txt b/Documentation/filesystems/isofs.txt
deleted file mode 100644
index ba0a93384de0..000000000000
--- a/Documentation/filesystems/isofs.txt
+++ /dev/null
@@ -1,48 +0,0 @@
-Mount options that are the same as for msdos and vfat partitions.
-
-  gid=nnn	All files in the partition will be in group nnn.
-  uid=nnn	All files in the partition will be owned by user id nnn.
-  umask=nnn	The permission mask (see umask(1)) for the partition.
-
-Mount options that are the same as vfat partitions. These are only useful
-when using discs encoded using Microsoft's Joliet extensions.
-  iocharset=name Character set to use for converting from Unicode to
-		ASCII.  Joliet filenames are stored in Unicode format, but
-		Unix for the most part doesn't know how to deal with Unicode.
-		There is also an option of doing UTF-8 translations with the
-		utf8 option.
-  utf8          Encode Unicode names in UTF-8 format. Default is no.
-
-Mount options unique to the isofs filesystem.
-  block=512     Set the block size for the disk to 512 bytes
-  block=1024    Set the block size for the disk to 1024 bytes
-  block=2048    Set the block size for the disk to 2048 bytes
-  check=relaxed Matches filenames with different cases
-  check=strict  Matches only filenames with the exact same case
-  cruft         Try to handle badly formatted CDs.
-  map=off       Do not map non-Rock Ridge filenames to lower case
-  map=normal    Map non-Rock Ridge filenames to lower case
-  map=acorn     As map=normal but also apply Acorn extensions if present
-  mode=xxx      Sets the permissions on files to xxx unless Rock Ridge
-		extensions set the permissions otherwise
-  dmode=xxx     Sets the permissions on directories to xxx unless Rock Ridge
-		extensions set the permissions otherwise
-  overriderockperm Set permissions on files and directories according to
-		'mode' and 'dmode' even though Rock Ridge extensions are
-		present.
-  nojoliet      Ignore Joliet extensions if they are present.
-  norock        Ignore Rock Ridge extensions if they are present.
-  hide		Completely strip hidden files from the file system.
-  showassoc	Show files marked with the 'associated' bit
-  unhide	Deprecated; showing hidden files is now default;
-		If given, it is a synonym for 'showassoc' which will
-		recreate previous unhide behavior
-  session=x     Select number of session on multisession CD
-  sbsector=xxx  Session begins from sector xxx
-
-Recommended documents about ISO 9660 standard are located at:
-http://www.y-adagio.com/
-ftp://ftp.ecma.ch/ecma-st/Ecma-119.pdf
-Quoting from the PDF "This 2nd Edition of Standard ECMA-119 is technically 
-identical with ISO 9660.", so it is a valid and gratis substitute of the
-official ISO specification.
-- 
cgit 


From 2640c19dcab0f6530007dfb4ee5870f5d61b0772 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:12 +0100
Subject: docs: filesystems: convert nilfs2.txt to ReST

- Add a SPDX header;
- Add a document title;
- Adjust document title;
- Mark literal blocks as such;
- use :field: markup;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/f7989ca501585f5990fffd2d365cfca4fe9fdd6f.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst  |   3 +-
 Documentation/filesystems/nilfs2.rst | 286 +++++++++++++++++++++++++++++++++++
 Documentation/filesystems/nilfs2.txt | 276 ---------------------------------
 3 files changed, 288 insertions(+), 277 deletions(-)
 create mode 100644 Documentation/filesystems/nilfs2.rst
 delete mode 100644 Documentation/filesystems/nilfs2.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 8c8813ada53f..01587704fcc9 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -70,9 +70,10 @@ Documentation for filesystem implementations.
    hfs
    hfsplus
    hpfs
+   fuse
    inotify
    isofs
-   fuse
+   nilfs2
    overlayfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/nilfs2.rst b/Documentation/filesystems/nilfs2.rst
new file mode 100644
index 000000000000..6c49f04e9e0a
--- /dev/null
+++ b/Documentation/filesystems/nilfs2.rst
@@ -0,0 +1,286 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======
+NILFS2
+======
+
+NILFS2 is a log-structured file system (LFS) supporting continuous
+snapshotting.  In addition to versioning capability of the entire file
+system, users can even restore files mistakenly overwritten or
+destroyed just a few seconds ago.  Since NILFS2 can keep consistency
+like conventional LFS, it achieves quick recovery after system
+crashes.
+
+NILFS2 creates a number of checkpoints every few seconds or per
+synchronous write basis (unless there is no change).  Users can select
+significant versions among continuously created checkpoints, and can
+change them into snapshots which will be preserved until they are
+changed back to checkpoints.
+
+There is no limit on the number of snapshots until the volume gets
+full.  Each snapshot is mountable as a read-only file system
+concurrently with its writable mount, and this feature is convenient
+for online backup.
+
+The userland tools are included in nilfs-utils package, which is
+available from the following download page.  At least "mkfs.nilfs2",
+"mount.nilfs2", "umount.nilfs2", and "nilfs_cleanerd" (so called
+cleaner or garbage collector) are required.  Details on the tools are
+described in the man pages included in the package.
+
+:Project web page:    https://nilfs.sourceforge.io/
+:Download page:       https://nilfs.sourceforge.io/en/download.html
+:List info:           http://vger.kernel.org/vger-lists.html#linux-nilfs
+
+Caveats
+=======
+
+Features which NILFS2 does not support yet:
+
+	- atime
+	- extended attributes
+	- POSIX ACLs
+	- quotas
+	- fsck
+	- defragmentation
+
+Mount options
+=============
+
+NILFS2 supports the following mount options:
+(*) == default
+
+======================= =======================================================
+barrier(*)		This enables/disables the use of write barriers.  This
+nobarrier		requires an IO stack which can support barriers, and
+			if nilfs gets an error on a barrier write, it will
+			disable again with a warning.
+errors=continue		Keep going on a filesystem error.
+errors=remount-ro(*)	Remount the filesystem read-only on an error.
+errors=panic		Panic and halt the machine if an error occurs.
+cp=n			Specify the checkpoint-number of the snapshot to be
+			mounted.  Checkpoints and snapshots are listed by lscp
+			user command.  Only the checkpoints marked as snapshot
+			are mountable with this option.  Snapshot is read-only,
+			so a read-only mount option must be specified together.
+order=relaxed(*)	Apply relaxed order semantics that allows modified data
+			blocks to be written to disk without making a
+			checkpoint if no metadata update is going.  This mode
+			is equivalent to the ordered data mode of the ext3
+			filesystem except for the updates on data blocks still
+			conserve atomicity.  This will improve synchronous
+			write performance for overwriting.
+order=strict		Apply strict in-order semantics that preserves sequence
+			of all file operations including overwriting of data
+			blocks.  That means, it is guaranteed that no
+			overtaking of events occurs in the recovered file
+			system after a crash.
+norecovery		Disable recovery of the filesystem on mount.
+			This disables every write access on the device for
+			read-only mounts or snapshots.  This option will fail
+			for r/w mounts on an unclean volume.
+discard			This enables/disables the use of discard/TRIM commands.
+nodiscard(*)		The discard/TRIM commands are sent to the underlying
+			block device when blocks are freed.  This is useful
+			for SSD devices and sparse/thinly-provisioned LUNs.
+======================= =======================================================
+
+Ioctls
+======
+
+There is some NILFS2 specific functionality which can be accessed by applications
+through the system call interfaces. The list of all NILFS2 specific ioctls are
+shown in the table below.
+
+Table of NILFS2 specific ioctls:
+
+ ============================== ===============================================
+ Ioctl			        Description
+ ============================== ===============================================
+ NILFS_IOCTL_CHANGE_CPMODE      Change mode of given checkpoint between
+			        checkpoint and snapshot state. This ioctl is
+			        used in chcp and mkcp utilities.
+
+ NILFS_IOCTL_DELETE_CHECKPOINT  Remove checkpoint from NILFS2 file system.
+			        This ioctl is used in rmcp utility.
+
+ NILFS_IOCTL_GET_CPINFO         Return info about requested checkpoints. This
+			        ioctl is used in lscp utility and by
+			        nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_GET_CPSTAT         Return checkpoints statistics. This ioctl is
+			        used by lscp, rmcp utilities and by
+			        nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_GET_SUINFO         Return segment usage info about requested
+			        segments. This ioctl is used in lssu,
+			        nilfs_resize utilities and by nilfs_cleanerd
+			        daemon.
+
+ NILFS_IOCTL_SET_SUINFO         Modify segment usage info of requested
+				segments. This ioctl is used by
+				nilfs_cleanerd daemon to skip unnecessary
+				cleaning operation of segments and reduce
+				performance penalty or wear of flash device
+				due to redundant move of in-use blocks.
+
+ NILFS_IOCTL_GET_SUSTAT         Return segment usage statistics. This ioctl
+			        is used in lssu, nilfs_resize utilities and
+			        by nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_GET_VINFO          Return information on virtual block addresses.
+			        This ioctl is used by nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_GET_BDESCS         Return information about descriptors of disk
+			        block numbers. This ioctl is used by
+			        nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_CLEAN_SEGMENTS     Do garbage collection operation in the
+			        environment of requested parameters from
+			        userspace. This ioctl is used by
+			        nilfs_cleanerd daemon.
+
+ NILFS_IOCTL_SYNC               Make a checkpoint. This ioctl is used in
+			        mkcp utility.
+
+ NILFS_IOCTL_RESIZE             Resize NILFS2 volume. This ioctl is used
+			        by nilfs_resize utility.
+
+ NILFS_IOCTL_SET_ALLOC_RANGE    Define lower limit of segments in bytes and
+			        upper limit of segments in bytes. This ioctl
+			        is used by nilfs_resize utility.
+ ============================== ===============================================
+
+NILFS2 usage
+============
+
+To use nilfs2 as a local file system, simply::
+
+ # mkfs -t nilfs2 /dev/block_device
+ # mount -t nilfs2 /dev/block_device /dir
+
+This will also invoke the cleaner through the mount helper program
+(mount.nilfs2).
+
+Checkpoints and snapshots are managed by the following commands.
+Their manpages are included in the nilfs-utils package above.
+
+  ====     ===========================================================
+  lscp     list checkpoints or snapshots.
+  mkcp     make a checkpoint or a snapshot.
+  chcp     change an existing checkpoint to a snapshot or vice versa.
+  rmcp     invalidate specified checkpoint(s).
+  ====     ===========================================================
+
+To mount a snapshot::
+
+ # mount -t nilfs2 -r -o cp=<cno> /dev/block_device /snap_dir
+
+where <cno> is the checkpoint number of the snapshot.
+
+To unmount the NILFS2 mount point or snapshot, simply::
+
+ # umount /dir
+
+Then, the cleaner daemon is automatically shut down by the umount
+helper program (umount.nilfs2).
+
+Disk format
+===========
+
+A nilfs2 volume is equally divided into a number of segments except
+for the super block (SB) and segment #0.  A segment is the container
+of logs.  Each log is composed of summary information blocks, payload
+blocks, and an optional super root block (SR)::
+
+   ______________________________________________________
+  | |SB| | Segment | Segment | Segment | ... | Segment | |
+  |_|__|_|____0____|____1____|____2____|_____|____N____|_|
+  0 +1K +4K       +8M       +16M      +24M  +(8MB x N)
+       .             .            (Typical offsets for 4KB-block)
+    .                  .
+  .______________________.
+  | log | log |... | log |
+  |__1__|__2__|____|__m__|
+        .       .
+      .               .
+    .                       .
+  .______________________________.
+  | Summary | Payload blocks  |SR|
+  |_blocks__|_________________|__|
+
+The payload blocks are organized per file, and each file consists of
+data blocks and B-tree node blocks::
+
+    |<---       File-A        --->|<---       File-B        --->|
+   _______________________________________________________________
+    | Data blocks | B-tree blocks | Data blocks | B-tree blocks | ...
+   _|_____________|_______________|_____________|_______________|_
+
+
+Since only the modified blocks are written in the log, it may have
+files without data blocks or B-tree node blocks.
+
+The organization of the blocks is recorded in the summary information
+blocks, which contains a header structure (nilfs_segment_summary), per
+file structures (nilfs_finfo), and per block structures (nilfs_binfo)::
+
+  _________________________________________________________________________
+ | Summary | finfo | binfo | ... | binfo | finfo | binfo | ... | binfo |...
+ |_blocks__|___A___|_(A,1)_|_____|(A,Na)_|___B___|_(B,1)_|_____|(B,Nb)_|___
+
+
+The logs include regular files, directory files, symbolic link files
+and several meta data files.  The mata data files are the files used
+to maintain file system meta data.  The current version of NILFS2 uses
+the following meta data files::
+
+ 1) Inode file (ifile)             -- Stores on-disk inodes
+ 2) Checkpoint file (cpfile)       -- Stores checkpoints
+ 3) Segment usage file (sufile)    -- Stores allocation state of segments
+ 4) Data address translation file  -- Maps virtual block numbers to usual
+    (DAT)                             block numbers.  This file serves to
+                                      make on-disk blocks relocatable.
+
+The following figure shows a typical organization of the logs::
+
+  _________________________________________________________________________
+ | Summary | regular file | file  | ... | ifile | cpfile | sufile | DAT |SR|
+ |_blocks__|_or_directory_|_______|_____|_______|________|________|_____|__|
+
+
+To stride over segment boundaries, this sequence of files may be split
+into multiple logs.  The sequence of logs that should be treated as
+logically one log, is delimited with flags marked in the segment
+summary.  The recovery code of nilfs2 looks this boundary information
+to ensure atomicity of updates.
+
+The super root block is inserted for every checkpoints.  It includes
+three special inodes, inodes for the DAT, cpfile, and sufile.  Inodes
+of regular files, directories, symlinks and other special files, are
+included in the ifile.  The inode of ifile itself is included in the
+corresponding checkpoint entry in the cpfile.  Thus, the hierarchy
+among NILFS2 files can be depicted as follows::
+
+  Super block (SB)
+       |
+       v
+  Super root block (the latest cno=xx)
+       |-- DAT
+       |-- sufile
+       `-- cpfile
+              |-- ifile (cno=c1)
+              |-- ifile (cno=c2) ---- file (ino=i1)
+              :        :          |-- file (ino=i2)
+              `-- ifile (cno=xx)  |-- file (ino=i3)
+                                  :        :
+                                  `-- file (ino=yy)
+                                    ( regular file, directory, or symlink )
+
+For detail on the format of each file, please see nilfs2_ondisk.h
+located at include/uapi/linux directory.
+
+There are no patents or other intellectual property that we protect
+with regard to the design of NILFS2.  It is allowed to replicate the
+design in hopes that other operating systems could share (mount, read,
+write, etc.) data stored in this format.
diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.txt
deleted file mode 100644
index f2f3f8592a6f..000000000000
--- a/Documentation/filesystems/nilfs2.txt
+++ /dev/null
@@ -1,276 +0,0 @@
-NILFS2
-------
-
-NILFS2 is a log-structured file system (LFS) supporting continuous
-snapshotting.  In addition to versioning capability of the entire file
-system, users can even restore files mistakenly overwritten or
-destroyed just a few seconds ago.  Since NILFS2 can keep consistency
-like conventional LFS, it achieves quick recovery after system
-crashes.
-
-NILFS2 creates a number of checkpoints every few seconds or per
-synchronous write basis (unless there is no change).  Users can select
-significant versions among continuously created checkpoints, and can
-change them into snapshots which will be preserved until they are
-changed back to checkpoints.
-
-There is no limit on the number of snapshots until the volume gets
-full.  Each snapshot is mountable as a read-only file system
-concurrently with its writable mount, and this feature is convenient
-for online backup.
-
-The userland tools are included in nilfs-utils package, which is
-available from the following download page.  At least "mkfs.nilfs2",
-"mount.nilfs2", "umount.nilfs2", and "nilfs_cleanerd" (so called
-cleaner or garbage collector) are required.  Details on the tools are
-described in the man pages included in the package.
-
-Project web page:    https://nilfs.sourceforge.io/
-Download page:       https://nilfs.sourceforge.io/en/download.html
-List info:           http://vger.kernel.org/vger-lists.html#linux-nilfs
-
-Caveats
-=======
-
-Features which NILFS2 does not support yet:
-
-	- atime
-	- extended attributes
-	- POSIX ACLs
-	- quotas
-	- fsck
-	- defragmentation
-
-Mount options
-=============
-
-NILFS2 supports the following mount options:
-(*) == default
-
-barrier(*)		This enables/disables the use of write barriers.  This
-nobarrier		requires an IO stack which can support barriers, and
-			if nilfs gets an error on a barrier write, it will
-			disable again with a warning.
-errors=continue		Keep going on a filesystem error.
-errors=remount-ro(*)	Remount the filesystem read-only on an error.
-errors=panic		Panic and halt the machine if an error occurs.
-cp=n			Specify the checkpoint-number of the snapshot to be
-			mounted.  Checkpoints and snapshots are listed by lscp
-			user command.  Only the checkpoints marked as snapshot
-			are mountable with this option.  Snapshot is read-only,
-			so a read-only mount option must be specified together.
-order=relaxed(*)	Apply relaxed order semantics that allows modified data
-			blocks to be written to disk without making a
-			checkpoint if no metadata update is going.  This mode
-			is equivalent to the ordered data mode of the ext3
-			filesystem except for the updates on data blocks still
-			conserve atomicity.  This will improve synchronous
-			write performance for overwriting.
-order=strict		Apply strict in-order semantics that preserves sequence
-			of all file operations including overwriting of data
-			blocks.  That means, it is guaranteed that no
-			overtaking of events occurs in the recovered file
-			system after a crash.
-norecovery		Disable recovery of the filesystem on mount.
-			This disables every write access on the device for
-			read-only mounts or snapshots.  This option will fail
-			for r/w mounts on an unclean volume.
-discard			This enables/disables the use of discard/TRIM commands.
-nodiscard(*)		The discard/TRIM commands are sent to the underlying
-			block device when blocks are freed.  This is useful
-			for SSD devices and sparse/thinly-provisioned LUNs.
-
-Ioctls
-======
-
-There is some NILFS2 specific functionality which can be accessed by applications
-through the system call interfaces. The list of all NILFS2 specific ioctls are
-shown in the table below.
-
-Table of NILFS2 specific ioctls
-..............................................................................
- Ioctl			        Description
- NILFS_IOCTL_CHANGE_CPMODE      Change mode of given checkpoint between
-			        checkpoint and snapshot state. This ioctl is
-			        used in chcp and mkcp utilities.
-
- NILFS_IOCTL_DELETE_CHECKPOINT  Remove checkpoint from NILFS2 file system.
-			        This ioctl is used in rmcp utility.
-
- NILFS_IOCTL_GET_CPINFO         Return info about requested checkpoints. This
-			        ioctl is used in lscp utility and by
-			        nilfs_cleanerd daemon.
-
- NILFS_IOCTL_GET_CPSTAT         Return checkpoints statistics. This ioctl is
-			        used by lscp, rmcp utilities and by
-			        nilfs_cleanerd daemon.
-
- NILFS_IOCTL_GET_SUINFO         Return segment usage info about requested
-			        segments. This ioctl is used in lssu,
-			        nilfs_resize utilities and by nilfs_cleanerd
-			        daemon.
-
- NILFS_IOCTL_SET_SUINFO         Modify segment usage info of requested
-				segments. This ioctl is used by
-				nilfs_cleanerd daemon to skip unnecessary
-				cleaning operation of segments and reduce
-				performance penalty or wear of flash device
-				due to redundant move of in-use blocks.
-
- NILFS_IOCTL_GET_SUSTAT         Return segment usage statistics. This ioctl
-			        is used in lssu, nilfs_resize utilities and
-			        by nilfs_cleanerd daemon.
-
- NILFS_IOCTL_GET_VINFO          Return information on virtual block addresses.
-			        This ioctl is used by nilfs_cleanerd daemon.
-
- NILFS_IOCTL_GET_BDESCS         Return information about descriptors of disk
-			        block numbers. This ioctl is used by
-			        nilfs_cleanerd daemon.
-
- NILFS_IOCTL_CLEAN_SEGMENTS     Do garbage collection operation in the
-			        environment of requested parameters from
-			        userspace. This ioctl is used by
-			        nilfs_cleanerd daemon.
-
- NILFS_IOCTL_SYNC               Make a checkpoint. This ioctl is used in
-			        mkcp utility.
-
- NILFS_IOCTL_RESIZE             Resize NILFS2 volume. This ioctl is used
-			        by nilfs_resize utility.
-
- NILFS_IOCTL_SET_ALLOC_RANGE    Define lower limit of segments in bytes and
-			        upper limit of segments in bytes. This ioctl
-			        is used by nilfs_resize utility.
-
-NILFS2 usage
-============
-
-To use nilfs2 as a local file system, simply:
-
- # mkfs -t nilfs2 /dev/block_device
- # mount -t nilfs2 /dev/block_device /dir
-
-This will also invoke the cleaner through the mount helper program
-(mount.nilfs2).
-
-Checkpoints and snapshots are managed by the following commands.
-Their manpages are included in the nilfs-utils package above.
-
-  lscp     list checkpoints or snapshots.
-  mkcp     make a checkpoint or a snapshot.
-  chcp     change an existing checkpoint to a snapshot or vice versa.
-  rmcp     invalidate specified checkpoint(s).
-
-To mount a snapshot,
-
- # mount -t nilfs2 -r -o cp=<cno> /dev/block_device /snap_dir
-
-where <cno> is the checkpoint number of the snapshot.
-
-To unmount the NILFS2 mount point or snapshot, simply:
-
- # umount /dir
-
-Then, the cleaner daemon is automatically shut down by the umount
-helper program (umount.nilfs2).
-
-Disk format
-===========
-
-A nilfs2 volume is equally divided into a number of segments except
-for the super block (SB) and segment #0.  A segment is the container
-of logs.  Each log is composed of summary information blocks, payload
-blocks, and an optional super root block (SR):
-
-   ______________________________________________________
-  | |SB| | Segment | Segment | Segment | ... | Segment | |
-  |_|__|_|____0____|____1____|____2____|_____|____N____|_|
-  0 +1K +4K       +8M       +16M      +24M  +(8MB x N)
-       .             .            (Typical offsets for 4KB-block)
-    .                  .
-  .______________________.
-  | log | log |... | log |
-  |__1__|__2__|____|__m__|
-        .       .
-      .               .
-    .                       .
-  .______________________________.
-  | Summary | Payload blocks  |SR|
-  |_blocks__|_________________|__|
-
-The payload blocks are organized per file, and each file consists of
-data blocks and B-tree node blocks:
-
-    |<---       File-A        --->|<---       File-B        --->|
-   _______________________________________________________________
-    | Data blocks | B-tree blocks | Data blocks | B-tree blocks | ...
-   _|_____________|_______________|_____________|_______________|_
-
-
-Since only the modified blocks are written in the log, it may have
-files without data blocks or B-tree node blocks.
-
-The organization of the blocks is recorded in the summary information
-blocks, which contains a header structure (nilfs_segment_summary), per
-file structures (nilfs_finfo), and per block structures (nilfs_binfo):
-
-  _________________________________________________________________________
- | Summary | finfo | binfo | ... | binfo | finfo | binfo | ... | binfo |...
- |_blocks__|___A___|_(A,1)_|_____|(A,Na)_|___B___|_(B,1)_|_____|(B,Nb)_|___
-
-
-The logs include regular files, directory files, symbolic link files
-and several meta data files.  The mata data files are the files used
-to maintain file system meta data.  The current version of NILFS2 uses
-the following meta data files:
-
- 1) Inode file (ifile)             -- Stores on-disk inodes
- 2) Checkpoint file (cpfile)       -- Stores checkpoints
- 3) Segment usage file (sufile)    -- Stores allocation state of segments
- 4) Data address translation file  -- Maps virtual block numbers to usual
-    (DAT)                             block numbers.  This file serves to
-                                      make on-disk blocks relocatable.
-
-The following figure shows a typical organization of the logs:
-
-  _________________________________________________________________________
- | Summary | regular file | file  | ... | ifile | cpfile | sufile | DAT |SR|
- |_blocks__|_or_directory_|_______|_____|_______|________|________|_____|__|
-
-
-To stride over segment boundaries, this sequence of files may be split
-into multiple logs.  The sequence of logs that should be treated as
-logically one log, is delimited with flags marked in the segment
-summary.  The recovery code of nilfs2 looks this boundary information
-to ensure atomicity of updates.
-
-The super root block is inserted for every checkpoints.  It includes
-three special inodes, inodes for the DAT, cpfile, and sufile.  Inodes
-of regular files, directories, symlinks and other special files, are
-included in the ifile.  The inode of ifile itself is included in the
-corresponding checkpoint entry in the cpfile.  Thus, the hierarchy
-among NILFS2 files can be depicted as follows:
-
-  Super block (SB)
-       |
-       v
-  Super root block (the latest cno=xx)
-       |-- DAT
-       |-- sufile
-       `-- cpfile
-              |-- ifile (cno=c1)
-              |-- ifile (cno=c2) ---- file (ino=i1)
-              :        :          |-- file (ino=i2)
-              `-- ifile (cno=xx)  |-- file (ino=i3)
-                                  :        :
-                                  `-- file (ino=yy)
-                                    ( regular file, directory, or symlink )
-
-For detail on the format of each file, please see nilfs2_ondisk.h
-located at include/uapi/linux directory.
-
-There are no patents or other intellectual property that we protect
-with regard to the design of NILFS2.  It is allowed to replicate the
-design in hopes that other operating systems could share (mount, read,
-write, etc.) data stored in this format.
-- 
cgit 


From 461f2c8f13fcc0d349e4acac46aacf63dbeb34ca Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:13 +0100
Subject: docs: filesystems: convert ntfs.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Comment out text-only ToC;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/f09ca6c9bdd4e7aa7208f3dba0b8753080b38d03.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   3 +-
 Documentation/filesystems/ntfs.rst  | 466 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/ntfs.txt  | 451 ----------------------------------
 3 files changed, 468 insertions(+), 452 deletions(-)
 create mode 100644 Documentation/filesystems/ntfs.rst
 delete mode 100644 Documentation/filesystems/ntfs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 01587704fcc9..62be53c4755d 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -74,7 +74,8 @@ Documentation for filesystem implementations.
    inotify
    isofs
    nilfs2
+   nfs/index
+   ntfs
    overlayfs
    virtiofs
    vfat
-   nfs/index
diff --git a/Documentation/filesystems/ntfs.rst b/Documentation/filesystems/ntfs.rst
new file mode 100644
index 000000000000..5bb093a26485
--- /dev/null
+++ b/Documentation/filesystems/ntfs.rst
@@ -0,0 +1,466 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================================
+The Linux NTFS filesystem driver
+================================
+
+
+.. Table of contents
+
+   - Overview
+   - Web site
+   - Features
+   - Supported mount options
+   - Known bugs and (mis-)features
+   - Using NTFS volume and stripe sets
+     - The Device-Mapper driver
+     - The Software RAID / MD driver
+     - Limitations when using the MD driver
+
+
+Overview
+========
+
+Linux-NTFS comes with a number of user-space programs known as ntfsprogs.
+These include mkntfs, a full-featured ntfs filesystem format utility,
+ntfsundelete used for recovering files that were unintentionally deleted
+from an NTFS volume and ntfsresize which is used to resize an NTFS partition.
+See the web site for more information.
+
+To mount an NTFS 1.2/3.x (Windows NT4/2000/XP/2003) volume, use the file
+system type 'ntfs'.  The driver currently supports read-only mode (with no
+fault-tolerance, encryption or journalling) and very limited, but safe, write
+support.
+
+For fault tolerance and raid support (i.e. volume and stripe sets), you can
+use the kernel's Software RAID / MD driver.  See section "Using Software RAID
+with NTFS" for details.
+
+
+Web site
+========
+
+There is plenty of additional information on the linux-ntfs web site
+at http://www.linux-ntfs.org/
+
+The web site has a lot of additional information, such as a comprehensive
+FAQ, documentation on the NTFS on-disk format, information on the Linux-NTFS
+userspace utilities, etc.
+
+
+Features
+========
+
+- This is a complete rewrite of the NTFS driver that used to be in the 2.4 and
+  earlier kernels.  This new driver implements NTFS read support and is
+  functionally equivalent to the old ntfs driver and it also implements limited
+  write support.  The biggest limitation at present is that files/directories
+  cannot be created or deleted.  See below for the list of write features that
+  are so far supported.  Another limitation is that writing to compressed files
+  is not implemented at all.  Also, neither read nor write access to encrypted
+  files is so far implemented.
+- The new driver has full support for sparse files on NTFS 3.x volumes which
+  the old driver isn't happy with.
+- The new driver supports execution of binaries due to mmap() now being
+  supported.
+- The new driver supports loopback mounting of files on NTFS which is used by
+  some Linux distributions to enable the user to run Linux from an NTFS
+  partition by creating a large file while in Windows and then loopback
+  mounting the file while in Linux and creating a Linux filesystem on it that
+  is used to install Linux on it.
+- A comparison of the two drivers using::
+
+	time find . -type f -exec md5sum "{}" \;
+
+  run three times in sequence with each driver (after a reboot) on a 1.4GiB
+  NTFS partition, showed the new driver to be 20% faster in total time elapsed
+  (from 9:43 minutes on average down to 7:53).  The time spent in user space
+  was unchanged but the time spent in the kernel was decreased by a factor of
+  2.5 (from 85 CPU seconds down to 33).
+- The driver does not support short file names in general.  For backwards
+  compatibility, we implement access to files using their short file names if
+  they exist.  The driver will not create short file names however, and a
+  rename will discard any existing short file name.
+- The new driver supports exporting of mounted NTFS volumes via NFS.
+- The new driver supports async io (aio).
+- The new driver supports fsync(2), fdatasync(2), and msync(2).
+- The new driver supports readv(2) and writev(2).
+- The new driver supports access time updates (including mtime and ctime).
+- The new driver supports truncate(2) and open(2) with O_TRUNC.  But at present
+  only very limited support for highly fragmented files, i.e. ones which have
+  their data attribute split across multiple extents, is included.  Another
+  limitation is that at present truncate(2) will never create sparse files,
+  since to mark a file sparse we need to modify the directory entry for the
+  file and we do not implement directory modifications yet.
+- The new driver supports write(2) which can both overwrite existing data and
+  extend the file size so that you can write beyond the existing data.  Also,
+  writing into sparse regions is supported and the holes are filled in with
+  clusters.  But at present only limited support for highly fragmented files,
+  i.e. ones which have their data attribute split across multiple extents, is
+  included.  Another limitation is that write(2) will never create sparse
+  files, since to mark a file sparse we need to modify the directory entry for
+  the file and we do not implement directory modifications yet.
+
+Supported mount options
+=======================
+
+In addition to the generic mount options described by the manual page for the
+mount command (man 8 mount, also see man 5 fstab), the NTFS driver supports the
+following mount options:
+
+======================= =======================================================
+iocharset=name		Deprecated option.  Still supported but please use
+			nls=name in the future.  See description for nls=name.
+
+nls=name		Character set to use when returning file names.
+			Unlike VFAT, NTFS suppresses names that contain
+			unconvertible characters.  Note that most character
+			sets contain insufficient characters to represent all
+			possible Unicode characters that can exist on NTFS.
+			To be sure you are not missing any files, you are
+			advised to use nls=utf8 which is capable of
+			representing all Unicode characters.
+
+utf8=<bool>		Option no longer supported.  Currently mapped to
+			nls=utf8 but please use nls=utf8 in the future and
+			make sure utf8 is compiled either as module or into
+			the kernel.  See description for nls=name.
+
+uid=
+gid=
+umask=			Provide default owner, group, and access mode mask.
+			These options work as documented in mount(8).  By
+			default, the files/directories are owned by root and
+			he/she has read and write permissions, as well as
+			browse permission for directories.  No one else has any
+			access permissions.  I.e. the mode on all files is by
+			default rw------- and for directories rwx------, a
+			consequence of the default fmask=0177 and dmask=0077.
+			Using a umask of zero will grant all permissions to
+			everyone, i.e. all files and directories will have mode
+			rwxrwxrwx.
+
+fmask=
+dmask=			Instead of specifying umask which applies both to
+			files and directories, fmask applies only to files and
+			dmask only to directories.
+
+sloppy=<BOOL>		If sloppy is specified, ignore unknown mount options.
+			Otherwise the default behaviour is to abort mount if
+			any unknown options are found.
+
+show_sys_files=<BOOL>	If show_sys_files is specified, show the system files
+			in directory listings.  Otherwise the default behaviour
+			is to hide the system files.
+			Note that even when show_sys_files is specified, "$MFT"
+			will not be visible due to bugs/mis-features in glibc.
+			Further, note that irrespective of show_sys_files, all
+			files are accessible by name, i.e. you can always do
+			"ls -l \$UpCase" for example to specifically show the
+			system file containing the Unicode upcase table.
+
+case_sensitive=<BOOL>	If case_sensitive is specified, treat all file names as
+			case sensitive and create file names in the POSIX
+			namespace.  Otherwise the default behaviour is to treat
+			file names as case insensitive and to create file names
+			in the WIN32/LONG name space.  Note, the Linux NTFS
+			driver will never create short file names and will
+			remove them on rename/delete of the corresponding long
+			file name.
+			Note that files remain accessible via their short file
+			name, if it exists.  If case_sensitive, you will need
+			to provide the correct case of the short file name.
+
+disable_sparse=<BOOL>	If disable_sparse is specified, creation of sparse
+			regions, i.e. holes, inside files is disabled for the
+			volume (for the duration of this mount only).  By
+			default, creation of sparse regions is enabled, which
+			is consistent with the behaviour of traditional Unix
+			filesystems.
+
+errors=opt		What to do when critical filesystem errors are found.
+			Following values can be used for "opt":
+
+			  ========  =========================================
+			  continue  DEFAULT, try to clean-up as much as
+				    possible, e.g. marking a corrupt inode as
+				    bad so it is no longer accessed, and then
+				    continue.
+			  recover   At present only supported is recovery of
+				    the boot sector from the backup copy.
+				    If read-only mount, the recovery is done
+				    in memory only and not written to disk.
+			  ========  =========================================
+
+			Note that the options are additive, i.e. specifying::
+
+			   errors=continue,errors=recover
+
+			means the driver will attempt to recover and if that
+			fails it will clean-up as much as possible and
+			continue.
+
+mft_zone_multiplier=	Set the MFT zone multiplier for the volume (this
+			setting is not persistent across mounts and can be
+			changed from mount to mount but cannot be changed on
+			remount).  Values of 1 to 4 are allowed, 1 being the
+			default.  The MFT zone multiplier determines how much
+			space is reserved for the MFT on the volume.  If all
+			other space is used up, then the MFT zone will be
+			shrunk dynamically, so this has no impact on the
+			amount of free space.  However, it can have an impact
+			on performance by affecting fragmentation of the MFT.
+			In general use the default.  If you have a lot of small
+			files then use a higher value.  The values have the
+			following meaning:
+
+			      =====	    =================================
+			      Value	     MFT zone size (% of volume size)
+			      =====	    =================================
+				1		12.5%
+				2		25%
+				3		37.5%
+				4		50%
+			      =====	    =================================
+
+			Note this option is irrelevant for read-only mounts.
+======================= =======================================================
+
+
+Known bugs and (mis-)features
+=============================
+
+- The link count on each directory inode entry is set to 1, due to Linux not
+  supporting directory hard links.  This may well confuse some user space
+  applications, since the directory names will have the same inode numbers.
+  This also speeds up ntfs_read_inode() immensely.  And we haven't found any
+  problems with this approach so far.  If you find a problem with this, please
+  let us know.
+
+
+Please send bug reports/comments/feedback/abuse to the Linux-NTFS development
+list at sourceforge: linux-ntfs-dev@lists.sourceforge.net
+
+
+Using NTFS volume and stripe sets
+=================================
+
+For support of volume and stripe sets, you can either use the kernel's
+Device-Mapper driver or the kernel's Software RAID / MD driver.  The former is
+the recommended one to use for linear raid.  But the latter is required for
+raid level 5.  For striping and mirroring, either driver should work fine.
+
+
+The Device-Mapper driver
+------------------------
+
+You will need to create a table of the components of the volume/stripe set and
+how they fit together and load this into the kernel using the dmsetup utility
+(see man 8 dmsetup).
+
+Linear volume sets, i.e. linear raid, has been tested and works fine.  Even
+though untested, there is no reason why stripe sets, i.e. raid level 0, and
+mirrors, i.e. raid level 1 should not work, too.  Stripes with parity, i.e.
+raid level 5, unfortunately cannot work yet because the current version of the
+Device-Mapper driver does not support raid level 5.  You may be able to use the
+Software RAID / MD driver for raid level 5, see the next section for details.
+
+To create the table describing your volume you will need to know each of its
+components and their sizes in sectors, i.e. multiples of 512-byte blocks.
+
+For NT4 fault tolerant volumes you can obtain the sizes using fdisk.  So for
+example if one of your partitions is /dev/hda2 you would do::
+
+    $ fdisk -ul /dev/hda
+
+    Disk /dev/hda: 81.9 GB, 81964302336 bytes
+    255 heads, 63 sectors/track, 9964 cylinders, total 160086528 sectors
+    Units = sectors of 1 * 512 = 512 bytes
+
+	Device Boot      Start         End      Blocks   Id  System
+	/dev/hda1   *          63     4209029     2104483+  83  Linux
+	/dev/hda2         4209030    37768814    16779892+  86  NTFS
+	/dev/hda3        37768815    46170809     4200997+  83  Linux
+
+And you would know that /dev/hda2 has a size of 37768814 - 4209030 + 1 =
+33559785 sectors.
+
+For Win2k and later dynamic disks, you can for example use the ldminfo utility
+which is part of the Linux LDM tools (the latest version at the time of
+writing is linux-ldm-0.0.8.tar.bz2).  You can download it from:
+
+	http://www.linux-ntfs.org/
+
+Simply extract the downloaded archive (tar xvjf linux-ldm-0.0.8.tar.bz2), go
+into it (cd linux-ldm-0.0.8) and change to the test directory (cd test).  You
+will find the precompiled (i386) ldminfo utility there.  NOTE: You will not be
+able to compile this yourself easily so use the binary version!
+
+Then you would use ldminfo in dump mode to obtain the necessary information::
+
+    $ ./ldminfo --dump /dev/hda
+
+This would dump the LDM database found on /dev/hda which describes all of your
+dynamic disks and all the volumes on them.  At the bottom you will see the
+VOLUME DEFINITIONS section which is all you really need.  You may need to look
+further above to determine which of the disks in the volume definitions is
+which device in Linux.  Hint: Run ldminfo on each of your dynamic disks and
+look at the Disk Id close to the top of the output for each (the PRIVATE HEADER
+section).  You can then find these Disk Ids in the VBLK DATABASE section in the
+<Disk> components where you will get the LDM Name for the disk that is found in
+the VOLUME DEFINITIONS section.
+
+Note you will also need to enable the LDM driver in the Linux kernel.  If your
+distribution did not enable it, you will need to recompile the kernel with it
+enabled.  This will create the LDM partitions on each device at boot time.  You
+would then use those devices (for /dev/hda they would be /dev/hda1, 2, 3, etc)
+in the Device-Mapper table.
+
+You can also bypass using the LDM driver by using the main device (e.g.
+/dev/hda) and then using the offsets of the LDM partitions into this device as
+the "Start sector of device" when creating the table.  Once again ldminfo would
+give you the correct information to do this.
+
+Assuming you know all your devices and their sizes things are easy.
+
+For a linear raid the table would look like this (note all values are in
+512-byte sectors)::
+
+    # Offset into	Size of this	Raid type	Device		Start sector
+    # volume	device						of device
+    0		1028161		linear		/dev/hda1	0
+    1028161		3903762		linear		/dev/hdb2	0
+    4931923		2103211		linear		/dev/hdc1	0
+
+For a striped volume, i.e. raid level 0, you will need to know the chunk size
+you used when creating the volume.  Windows uses 64kiB as the default, so it
+will probably be this unless you changes the defaults when creating the array.
+
+For a raid level 0 the table would look like this (note all values are in
+512-byte sectors)::
+
+    # Offset   Size	    Raid     Number   Chunk  1st        Start	2nd	  Start
+    # into     of the   type     of	      size   Device	in	Device	  in
+    # volume   volume	     stripes			device		  device
+    0	   2056320  striped  2	      128    /dev/hda1	0	/dev/hdb1 0
+
+If there are more than two devices, just add each of them to the end of the
+line.
+
+Finally, for a mirrored volume, i.e. raid level 1, the table would look like
+this (note all values are in 512-byte sectors)::
+
+    # Ofs Size   Raid   Log  Number Region Should Number Source  Start Target Start
+    # in  of the type   type of log size   sync?  of     Device  in    Device in
+    # vol volume		 params		     mirrors	     Device	  Device
+    0    2056320 mirror core 2	16     nosync 2	   /dev/hda1 0   /dev/hdb1 0
+
+If you are mirroring to multiple devices you can specify further targets at the
+end of the line.
+
+Note the "Should sync?" parameter "nosync" means that the two mirrors are
+already in sync which will be the case on a clean shutdown of Windows.  If the
+mirrors are not clean, you can specify the "sync" option instead of "nosync"
+and the Device-Mapper driver will then copy the entirety of the "Source Device"
+to the "Target Device" or if you specified multiple target devices to all of
+them.
+
+Once you have your table, save it in a file somewhere (e.g. /etc/ntfsvolume1),
+and hand it over to dmsetup to work with, like so::
+
+    $ dmsetup create myvolume1 /etc/ntfsvolume1
+
+You can obviously replace "myvolume1" with whatever name you like.
+
+If it all worked, you will now have the device /dev/device-mapper/myvolume1
+which you can then just use as an argument to the mount command as usual to
+mount the ntfs volume.  For example::
+
+    $ mount -t ntfs -o ro /dev/device-mapper/myvolume1 /mnt/myvol1
+
+(You need to create the directory /mnt/myvol1 first and of course you can use
+anything you like instead of /mnt/myvol1 as long as it is an existing
+directory.)
+
+It is advisable to do the mount read-only to see if the volume has been setup
+correctly to avoid the possibility of causing damage to the data on the ntfs
+volume.
+
+
+The Software RAID / MD driver
+-----------------------------
+
+An alternative to using the Device-Mapper driver is to use the kernel's
+Software RAID / MD driver.  For which you need to set up your /etc/raidtab
+appropriately (see man 5 raidtab).
+
+Linear volume sets, i.e. linear raid, as well as stripe sets, i.e. raid level
+0, have been tested and work fine (though see section "Limitations when using
+the MD driver with NTFS volumes" especially if you want to use linear raid).
+Even though untested, there is no reason why mirrors, i.e. raid level 1, and
+stripes with parity, i.e. raid level 5, should not work, too.
+
+You have to use the "persistent-superblock 0" option for each raid-disk in the
+NTFS volume/stripe you are configuring in /etc/raidtab as the persistent
+superblock used by the MD driver would damage the NTFS volume.
+
+Windows by default uses a stripe chunk size of 64k, so you probably want the
+"chunk-size 64k" option for each raid-disk, too.
+
+For example, if you have a stripe set consisting of two partitions /dev/hda5
+and /dev/hdb1 your /etc/raidtab would look like this::
+
+    raiddev /dev/md0
+	    raid-level	0
+	    nr-raid-disks	2
+	    nr-spare-disks	0
+	    persistent-superblock	0
+	    chunk-size	64k
+	    device		/dev/hda5
+	    raid-disk	0
+	    device		/dev/hdb1
+	    raid-disk	1
+
+For linear raid, just change the raid-level above to "raid-level linear", for
+mirrors, change it to "raid-level 1", and for stripe sets with parity, change
+it to "raid-level 5".
+
+Note for stripe sets with parity you will also need to tell the MD driver
+which parity algorithm to use by specifying the option "parity-algorithm
+which", where you need to replace "which" with the name of the algorithm to
+use (see man 5 raidtab for available algorithms) and you will have to try the
+different available algorithms until you find one that works.  Make sure you
+are working read-only when playing with this as you may damage your data
+otherwise.  If you find which algorithm works please let us know (email the
+linux-ntfs developers list linux-ntfs-dev@lists.sourceforge.net or drop in on
+IRC in channel #ntfs on the irc.freenode.net network) so we can update this
+documentation.
+
+Once the raidtab is setup, run for example raid0run -a to start all devices or
+raid0run /dev/md0 to start a particular md device, in this case /dev/md0.
+
+Then just use the mount command as usual to mount the ntfs volume using for
+example::
+
+    mount -t ntfs -o ro /dev/md0 /mnt/myntfsvolume
+
+It is advisable to do the mount read-only to see if the md volume has been
+setup correctly to avoid the possibility of causing damage to the data on the
+ntfs volume.
+
+
+Limitations when using the Software RAID / MD driver
+-----------------------------------------------------
+
+Using the md driver will not work properly if any of your NTFS partitions have
+an odd number of sectors.  This is especially important for linear raid as all
+data after the first partition with an odd number of sectors will be offset by
+one or more sectors so if you mount such a partition with write support you
+will cause massive damage to the data on the volume which will only become
+apparent when you try to use the volume again under Windows.
+
+So when using linear raid, make sure that all your partitions have an even
+number of sectors BEFORE attempting to use it.  You have been warned!
+
+Even better is to simply use the Device-Mapper for linear raid and then you do
+not have this problem with odd numbers of sectors.
diff --git a/Documentation/filesystems/ntfs.txt b/Documentation/filesystems/ntfs.txt
deleted file mode 100644
index 553f10d03076..000000000000
--- a/Documentation/filesystems/ntfs.txt
+++ /dev/null
@@ -1,451 +0,0 @@
-The Linux NTFS filesystem driver
-================================
-
-
-Table of contents
-=================
-
-- Overview
-- Web site
-- Features
-- Supported mount options
-- Known bugs and (mis-)features
-- Using NTFS volume and stripe sets
-  - The Device-Mapper driver
-  - The Software RAID / MD driver
-  - Limitations when using the MD driver
-
-
-Overview
-========
-
-Linux-NTFS comes with a number of user-space programs known as ntfsprogs.
-These include mkntfs, a full-featured ntfs filesystem format utility,
-ntfsundelete used for recovering files that were unintentionally deleted
-from an NTFS volume and ntfsresize which is used to resize an NTFS partition.
-See the web site for more information.
-
-To mount an NTFS 1.2/3.x (Windows NT4/2000/XP/2003) volume, use the file
-system type 'ntfs'.  The driver currently supports read-only mode (with no
-fault-tolerance, encryption or journalling) and very limited, but safe, write
-support.
-
-For fault tolerance and raid support (i.e. volume and stripe sets), you can
-use the kernel's Software RAID / MD driver.  See section "Using Software RAID
-with NTFS" for details.
-
-
-Web site
-========
-
-There is plenty of additional information on the linux-ntfs web site
-at http://www.linux-ntfs.org/
-
-The web site has a lot of additional information, such as a comprehensive
-FAQ, documentation on the NTFS on-disk format, information on the Linux-NTFS
-userspace utilities, etc.
-
-
-Features
-========
-
-- This is a complete rewrite of the NTFS driver that used to be in the 2.4 and
-  earlier kernels.  This new driver implements NTFS read support and is
-  functionally equivalent to the old ntfs driver and it also implements limited
-  write support.  The biggest limitation at present is that files/directories
-  cannot be created or deleted.  See below for the list of write features that
-  are so far supported.  Another limitation is that writing to compressed files
-  is not implemented at all.  Also, neither read nor write access to encrypted
-  files is so far implemented.
-- The new driver has full support for sparse files on NTFS 3.x volumes which
-  the old driver isn't happy with.
-- The new driver supports execution of binaries due to mmap() now being
-  supported.
-- The new driver supports loopback mounting of files on NTFS which is used by
-  some Linux distributions to enable the user to run Linux from an NTFS
-  partition by creating a large file while in Windows and then loopback
-  mounting the file while in Linux and creating a Linux filesystem on it that
-  is used to install Linux on it.
-- A comparison of the two drivers using:
-	time find . -type f -exec md5sum "{}" \;
-  run three times in sequence with each driver (after a reboot) on a 1.4GiB
-  NTFS partition, showed the new driver to be 20% faster in total time elapsed
-  (from 9:43 minutes on average down to 7:53).  The time spent in user space
-  was unchanged but the time spent in the kernel was decreased by a factor of
-  2.5 (from 85 CPU seconds down to 33).
-- The driver does not support short file names in general.  For backwards
-  compatibility, we implement access to files using their short file names if
-  they exist.  The driver will not create short file names however, and a
-  rename will discard any existing short file name.
-- The new driver supports exporting of mounted NTFS volumes via NFS.
-- The new driver supports async io (aio).
-- The new driver supports fsync(2), fdatasync(2), and msync(2).
-- The new driver supports readv(2) and writev(2).
-- The new driver supports access time updates (including mtime and ctime).
-- The new driver supports truncate(2) and open(2) with O_TRUNC.  But at present
-  only very limited support for highly fragmented files, i.e. ones which have
-  their data attribute split across multiple extents, is included.  Another
-  limitation is that at present truncate(2) will never create sparse files,
-  since to mark a file sparse we need to modify the directory entry for the
-  file and we do not implement directory modifications yet.
-- The new driver supports write(2) which can both overwrite existing data and
-  extend the file size so that you can write beyond the existing data.  Also,
-  writing into sparse regions is supported and the holes are filled in with
-  clusters.  But at present only limited support for highly fragmented files,
-  i.e. ones which have their data attribute split across multiple extents, is
-  included.  Another limitation is that write(2) will never create sparse
-  files, since to mark a file sparse we need to modify the directory entry for
-  the file and we do not implement directory modifications yet.
-
-Supported mount options
-=======================
-
-In addition to the generic mount options described by the manual page for the
-mount command (man 8 mount, also see man 5 fstab), the NTFS driver supports the
-following mount options:
-
-iocharset=name		Deprecated option.  Still supported but please use
-			nls=name in the future.  See description for nls=name.
-
-nls=name		Character set to use when returning file names.
-			Unlike VFAT, NTFS suppresses names that contain
-			unconvertible characters.  Note that most character
-			sets contain insufficient characters to represent all
-			possible Unicode characters that can exist on NTFS.
-			To be sure you are not missing any files, you are
-			advised to use nls=utf8 which is capable of
-			representing all Unicode characters.
-
-utf8=<bool>		Option no longer supported.  Currently mapped to
-			nls=utf8 but please use nls=utf8 in the future and
-			make sure utf8 is compiled either as module or into
-			the kernel.  See description for nls=name.
-
-uid=
-gid=
-umask=			Provide default owner, group, and access mode mask.
-			These options work as documented in mount(8).  By
-			default, the files/directories are owned by root and
-			he/she has read and write permissions, as well as
-			browse permission for directories.  No one else has any
-			access permissions.  I.e. the mode on all files is by
-			default rw------- and for directories rwx------, a
-			consequence of the default fmask=0177 and dmask=0077.
-			Using a umask of zero will grant all permissions to
-			everyone, i.e. all files and directories will have mode
-			rwxrwxrwx.
-
-fmask=
-dmask=			Instead of specifying umask which applies both to
-			files and directories, fmask applies only to files and
-			dmask only to directories.
-
-sloppy=<BOOL>		If sloppy is specified, ignore unknown mount options.
-			Otherwise the default behaviour is to abort mount if
-			any unknown options are found.
-
-show_sys_files=<BOOL>	If show_sys_files is specified, show the system files
-			in directory listings.  Otherwise the default behaviour
-			is to hide the system files.
-			Note that even when show_sys_files is specified, "$MFT"
-			will not be visible due to bugs/mis-features in glibc.
-			Further, note that irrespective of show_sys_files, all
-			files are accessible by name, i.e. you can always do
-			"ls -l \$UpCase" for example to specifically show the
-			system file containing the Unicode upcase table.
-
-case_sensitive=<BOOL>	If case_sensitive is specified, treat all file names as
-			case sensitive and create file names in the POSIX
-			namespace.  Otherwise the default behaviour is to treat
-			file names as case insensitive and to create file names
-			in the WIN32/LONG name space.  Note, the Linux NTFS
-			driver will never create short file names and will
-			remove them on rename/delete of the corresponding long
-			file name.
-			Note that files remain accessible via their short file
-			name, if it exists.  If case_sensitive, you will need
-			to provide the correct case of the short file name.
-
-disable_sparse=<BOOL>	If disable_sparse is specified, creation of sparse
-			regions, i.e. holes, inside files is disabled for the
-			volume (for the duration of this mount only).  By
-			default, creation of sparse regions is enabled, which
-			is consistent with the behaviour of traditional Unix
-			filesystems.
-
-errors=opt		What to do when critical filesystem errors are found.
-			Following values can be used for "opt":
-			  continue: DEFAULT, try to clean-up as much as
-				    possible, e.g. marking a corrupt inode as
-				    bad so it is no longer accessed, and then
-				    continue.
-			  recover:  At present only supported is recovery of
-				    the boot sector from the backup copy.
-				    If read-only mount, the recovery is done
-				    in memory only and not written to disk.
-			Note that the options are additive, i.e. specifying:
-			   errors=continue,errors=recover
-			means the driver will attempt to recover and if that
-			fails it will clean-up as much as possible and
-			continue.
-
-mft_zone_multiplier=	Set the MFT zone multiplier for the volume (this
-			setting is not persistent across mounts and can be
-			changed from mount to mount but cannot be changed on
-			remount).  Values of 1 to 4 are allowed, 1 being the
-			default.  The MFT zone multiplier determines how much
-			space is reserved for the MFT on the volume.  If all
-			other space is used up, then the MFT zone will be
-			shrunk dynamically, so this has no impact on the
-			amount of free space.  However, it can have an impact
-			on performance by affecting fragmentation of the MFT.
-			In general use the default.  If you have a lot of small
-			files then use a higher value.  The values have the
-			following meaning:
-			      Value	     MFT zone size (% of volume size)
-				1		12.5%
-				2		25%
-				3		37.5%
-				4		50%
-			Note this option is irrelevant for read-only mounts.
-
-
-Known bugs and (mis-)features
-=============================
-
-- The link count on each directory inode entry is set to 1, due to Linux not
-  supporting directory hard links.  This may well confuse some user space
-  applications, since the directory names will have the same inode numbers.
-  This also speeds up ntfs_read_inode() immensely.  And we haven't found any
-  problems with this approach so far.  If you find a problem with this, please
-  let us know.
-
-
-Please send bug reports/comments/feedback/abuse to the Linux-NTFS development
-list at sourceforge: linux-ntfs-dev@lists.sourceforge.net
-
-
-Using NTFS volume and stripe sets
-=================================
-
-For support of volume and stripe sets, you can either use the kernel's
-Device-Mapper driver or the kernel's Software RAID / MD driver.  The former is
-the recommended one to use for linear raid.  But the latter is required for
-raid level 5.  For striping and mirroring, either driver should work fine.
-
-
-The Device-Mapper driver
-------------------------
-
-You will need to create a table of the components of the volume/stripe set and
-how they fit together and load this into the kernel using the dmsetup utility
-(see man 8 dmsetup).
-
-Linear volume sets, i.e. linear raid, has been tested and works fine.  Even
-though untested, there is no reason why stripe sets, i.e. raid level 0, and
-mirrors, i.e. raid level 1 should not work, too.  Stripes with parity, i.e.
-raid level 5, unfortunately cannot work yet because the current version of the
-Device-Mapper driver does not support raid level 5.  You may be able to use the
-Software RAID / MD driver for raid level 5, see the next section for details.
-
-To create the table describing your volume you will need to know each of its
-components and their sizes in sectors, i.e. multiples of 512-byte blocks.
-
-For NT4 fault tolerant volumes you can obtain the sizes using fdisk.  So for
-example if one of your partitions is /dev/hda2 you would do:
-
-$ fdisk -ul /dev/hda
-
-Disk /dev/hda: 81.9 GB, 81964302336 bytes
-255 heads, 63 sectors/track, 9964 cylinders, total 160086528 sectors
-Units = sectors of 1 * 512 = 512 bytes
-
-   Device Boot      Start         End      Blocks   Id  System
-   /dev/hda1   *          63     4209029     2104483+  83  Linux
-   /dev/hda2         4209030    37768814    16779892+  86  NTFS
-   /dev/hda3        37768815    46170809     4200997+  83  Linux
-
-And you would know that /dev/hda2 has a size of 37768814 - 4209030 + 1 =
-33559785 sectors.
-
-For Win2k and later dynamic disks, you can for example use the ldminfo utility
-which is part of the Linux LDM tools (the latest version at the time of
-writing is linux-ldm-0.0.8.tar.bz2).  You can download it from:
-	http://www.linux-ntfs.org/
-Simply extract the downloaded archive (tar xvjf linux-ldm-0.0.8.tar.bz2), go
-into it (cd linux-ldm-0.0.8) and change to the test directory (cd test).  You
-will find the precompiled (i386) ldminfo utility there.  NOTE: You will not be
-able to compile this yourself easily so use the binary version!
-
-Then you would use ldminfo in dump mode to obtain the necessary information:
-
-$ ./ldminfo --dump /dev/hda
-
-This would dump the LDM database found on /dev/hda which describes all of your
-dynamic disks and all the volumes on them.  At the bottom you will see the
-VOLUME DEFINITIONS section which is all you really need.  You may need to look
-further above to determine which of the disks in the volume definitions is
-which device in Linux.  Hint: Run ldminfo on each of your dynamic disks and
-look at the Disk Id close to the top of the output for each (the PRIVATE HEADER
-section).  You can then find these Disk Ids in the VBLK DATABASE section in the
-<Disk> components where you will get the LDM Name for the disk that is found in
-the VOLUME DEFINITIONS section.
-
-Note you will also need to enable the LDM driver in the Linux kernel.  If your
-distribution did not enable it, you will need to recompile the kernel with it
-enabled.  This will create the LDM partitions on each device at boot time.  You
-would then use those devices (for /dev/hda they would be /dev/hda1, 2, 3, etc)
-in the Device-Mapper table.
-
-You can also bypass using the LDM driver by using the main device (e.g.
-/dev/hda) and then using the offsets of the LDM partitions into this device as
-the "Start sector of device" when creating the table.  Once again ldminfo would
-give you the correct information to do this.
-
-Assuming you know all your devices and their sizes things are easy.
-
-For a linear raid the table would look like this (note all values are in
-512-byte sectors):
-
---- cut here ---
-# Offset into	Size of this	Raid type	Device		Start sector
-# volume	device						of device
-0		1028161		linear		/dev/hda1	0
-1028161		3903762		linear		/dev/hdb2	0
-4931923		2103211		linear		/dev/hdc1	0
---- cut here ---
-
-For a striped volume, i.e. raid level 0, you will need to know the chunk size
-you used when creating the volume.  Windows uses 64kiB as the default, so it
-will probably be this unless you changes the defaults when creating the array.
-
-For a raid level 0 the table would look like this (note all values are in
-512-byte sectors):
-
---- cut here ---
-# Offset   Size	    Raid     Number   Chunk  1st        Start	2nd	  Start
-# into     of the   type     of	      size   Device	in	Device	  in
-# volume   volume	     stripes			device		  device
-0	   2056320  striped  2	      128    /dev/hda1	0	/dev/hdb1 0
---- cut here ---
-
-If there are more than two devices, just add each of them to the end of the
-line.
-
-Finally, for a mirrored volume, i.e. raid level 1, the table would look like
-this (note all values are in 512-byte sectors):
-
---- cut here ---
-# Ofs Size   Raid   Log  Number Region Should Number Source  Start Target Start
-# in  of the type   type of log size   sync?  of     Device  in    Device in
-# vol volume		 params		     mirrors	     Device	  Device
-0    2056320 mirror core 2	16     nosync 2	   /dev/hda1 0   /dev/hdb1 0
---- cut here ---
-
-If you are mirroring to multiple devices you can specify further targets at the
-end of the line.
-
-Note the "Should sync?" parameter "nosync" means that the two mirrors are
-already in sync which will be the case on a clean shutdown of Windows.  If the
-mirrors are not clean, you can specify the "sync" option instead of "nosync"
-and the Device-Mapper driver will then copy the entirety of the "Source Device"
-to the "Target Device" or if you specified multiple target devices to all of
-them.
-
-Once you have your table, save it in a file somewhere (e.g. /etc/ntfsvolume1),
-and hand it over to dmsetup to work with, like so:
-
-$ dmsetup create myvolume1 /etc/ntfsvolume1
-
-You can obviously replace "myvolume1" with whatever name you like.
-
-If it all worked, you will now have the device /dev/device-mapper/myvolume1
-which you can then just use as an argument to the mount command as usual to
-mount the ntfs volume.  For example:
-
-$ mount -t ntfs -o ro /dev/device-mapper/myvolume1 /mnt/myvol1
-
-(You need to create the directory /mnt/myvol1 first and of course you can use
-anything you like instead of /mnt/myvol1 as long as it is an existing
-directory.)
-
-It is advisable to do the mount read-only to see if the volume has been setup
-correctly to avoid the possibility of causing damage to the data on the ntfs
-volume.
-
-
-The Software RAID / MD driver
------------------------------
-
-An alternative to using the Device-Mapper driver is to use the kernel's
-Software RAID / MD driver.  For which you need to set up your /etc/raidtab
-appropriately (see man 5 raidtab).
-
-Linear volume sets, i.e. linear raid, as well as stripe sets, i.e. raid level
-0, have been tested and work fine (though see section "Limitations when using
-the MD driver with NTFS volumes" especially if you want to use linear raid).
-Even though untested, there is no reason why mirrors, i.e. raid level 1, and
-stripes with parity, i.e. raid level 5, should not work, too.
-
-You have to use the "persistent-superblock 0" option for each raid-disk in the
-NTFS volume/stripe you are configuring in /etc/raidtab as the persistent
-superblock used by the MD driver would damage the NTFS volume.
-
-Windows by default uses a stripe chunk size of 64k, so you probably want the
-"chunk-size 64k" option for each raid-disk, too.
-
-For example, if you have a stripe set consisting of two partitions /dev/hda5
-and /dev/hdb1 your /etc/raidtab would look like this:
-
-raiddev /dev/md0
-	raid-level	0
-	nr-raid-disks	2
-	nr-spare-disks	0
-	persistent-superblock	0
-	chunk-size	64k
-	device		/dev/hda5
-	raid-disk	0
-	device		/dev/hdb1
-	raid-disk	1
-
-For linear raid, just change the raid-level above to "raid-level linear", for
-mirrors, change it to "raid-level 1", and for stripe sets with parity, change
-it to "raid-level 5".
-
-Note for stripe sets with parity you will also need to tell the MD driver
-which parity algorithm to use by specifying the option "parity-algorithm
-which", where you need to replace "which" with the name of the algorithm to
-use (see man 5 raidtab for available algorithms) and you will have to try the
-different available algorithms until you find one that works.  Make sure you
-are working read-only when playing with this as you may damage your data
-otherwise.  If you find which algorithm works please let us know (email the
-linux-ntfs developers list linux-ntfs-dev@lists.sourceforge.net or drop in on
-IRC in channel #ntfs on the irc.freenode.net network) so we can update this
-documentation.
-
-Once the raidtab is setup, run for example raid0run -a to start all devices or
-raid0run /dev/md0 to start a particular md device, in this case /dev/md0.
-
-Then just use the mount command as usual to mount the ntfs volume using for
-example:	mount -t ntfs -o ro /dev/md0 /mnt/myntfsvolume
-
-It is advisable to do the mount read-only to see if the md volume has been
-setup correctly to avoid the possibility of causing damage to the data on the
-ntfs volume.
-
-
-Limitations when using the Software RAID / MD driver
------------------------------------------------------
-
-Using the md driver will not work properly if any of your NTFS partitions have
-an odd number of sectors.  This is especially important for linear raid as all
-data after the first partition with an odd number of sectors will be offset by
-one or more sectors so if you mount such a partition with write support you
-will cause massive damage to the data on the volume which will only become
-apparent when you try to use the volume again under Windows.
-
-So when using linear raid, make sure that all your partitions have an even
-number of sectors BEFORE attempting to use it.  You have been warned!
-
-Even better is to simply use the Device-Mapper for linear raid and then you do
-not have this problem with odd numbers of sectors.
-- 
cgit 


From 3d0c60d004644630f1431ce486e76adcc829e288 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:14 +0100
Subject: docs: filesystems: convert ocfs2-online-filecheck.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/6007166acc3252697755836354bd29b5a5fb82aa.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst                |  1 +
 .../filesystems/ocfs2-online-filecheck.rst         | 99 ++++++++++++++++++++++
 .../filesystems/ocfs2-online-filecheck.txt         | 94 --------------------
 3 files changed, 100 insertions(+), 94 deletions(-)
 create mode 100644 Documentation/filesystems/ocfs2-online-filecheck.rst
 delete mode 100644 Documentation/filesystems/ocfs2-online-filecheck.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 62be53c4755d..f3a26fdbd04f 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -76,6 +76,7 @@ Documentation for filesystem implementations.
    nilfs2
    nfs/index
    ntfs
+   ocfs2-online-filecheck
    overlayfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/ocfs2-online-filecheck.rst b/Documentation/filesystems/ocfs2-online-filecheck.rst
new file mode 100644
index 000000000000..2257bb53edc1
--- /dev/null
+++ b/Documentation/filesystems/ocfs2-online-filecheck.rst
@@ -0,0 +1,99 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================================
+OCFS2 file system - online file check
+=====================================
+
+This document will describe OCFS2 online file check feature.
+
+Introduction
+============
+OCFS2 is often used in high-availability systems. However, OCFS2 usually
+converts the filesystem to read-only when encounters an error. This may not be
+necessary, since turning the filesystem read-only would affect other running
+processes as well, decreasing availability.
+Then, a mount option (errors=continue) is introduced, which would return the
+-EIO errno to the calling process and terminate further processing so that the
+filesystem is not corrupted further. The filesystem is not converted to
+read-only, and the problematic file's inode number is reported in the kernel
+log. The user can try to check/fix this file via online filecheck feature.
+
+Scope
+=====
+This effort is to check/fix small issues which may hinder day-to-day operations
+of a cluster filesystem by turning the filesystem read-only. The scope of
+checking/fixing is at the file level, initially for regular files and eventually
+to all files (including system files) of the filesystem.
+
+In case of directory to file links is incorrect, the directory inode is
+reported as erroneous.
+
+This feature is not suited for extravagant checks which involve dependency of
+other components of the filesystem, such as but not limited to, checking if the
+bits for file blocks in the allocation has been set. In case of such an error,
+the offline fsck should/would be recommended.
+
+Finally, such an operation/feature should not be automated lest the filesystem
+may end up with more damage than before the repair attempt. So, this has to
+be performed using user interaction and consent.
+
+User interface
+==============
+When there are errors in the OCFS2 filesystem, they are usually accompanied
+by the inode number which caused the error. This inode number would be the
+input to check/fix the file.
+
+There is a sysfs directory for each OCFS2 file system mounting::
+
+  /sys/fs/ocfs2/<devname>/filecheck
+
+Here, <devname> indicates the name of OCFS2 volume device which has been already
+mounted. The file above would accept inode numbers. This could be used to
+communicate with kernel space, tell which file(inode number) will be checked or
+fixed. Currently, three operations are supported, which includes checking
+inode, fixing inode and setting the size of result record history.
+
+1. If you want to know what error exactly happened to <inode> before fixing, do::
+
+    # echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/check
+    # cat /sys/fs/ocfs2/<devname>/filecheck/check
+
+The output is like this::
+
+    INO		DONE	ERROR
+    39502		1	GENERATION
+
+    <INO> lists the inode numbers.
+    <DONE> indicates whether the operation has been finished.
+    <ERROR> says what kind of errors was found. For the detailed error numbers,
+    please refer to the file linux/fs/ocfs2/filecheck.h.
+
+2. If you determine to fix this inode, do::
+
+    # echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/fix
+    # cat /sys/fs/ocfs2/<devname>/filecheck/fix
+
+The output is like this:::
+
+    INO		DONE	ERROR
+    39502		1	SUCCESS
+
+This time, the <ERROR> column indicates whether this fix is successful or not.
+
+3. The record cache is used to store the history of check/fix results. It's
+default size is 10, and can be adjust between the range of 10 ~ 100. You can
+adjust the size like this::
+
+  # echo "<size>" > /sys/fs/ocfs2/<devname>/filecheck/set
+
+Fixing stuff
+============
+On receiving the inode, the filesystem would read the inode and the
+file metadata. In case of errors, the filesystem would fix the errors
+and report the problems it fixed in the kernel log. As a precautionary measure,
+the inode must first be checked for errors before performing a final fix.
+
+The inode and the result history will be maintained temporarily in a
+small linked list buffer which would contain the last (N) inodes
+fixed/checked, the detailed errors which were fixed/checked are printed in the
+kernel log.
diff --git a/Documentation/filesystems/ocfs2-online-filecheck.txt b/Documentation/filesystems/ocfs2-online-filecheck.txt
deleted file mode 100644
index 139fab175c8a..000000000000
--- a/Documentation/filesystems/ocfs2-online-filecheck.txt
+++ /dev/null
@@ -1,94 +0,0 @@
-		    OCFS2 online file check
-		    -----------------------
-
-This document will describe OCFS2 online file check feature.
-
-Introduction
-============
-OCFS2 is often used in high-availability systems. However, OCFS2 usually
-converts the filesystem to read-only when encounters an error. This may not be
-necessary, since turning the filesystem read-only would affect other running
-processes as well, decreasing availability.
-Then, a mount option (errors=continue) is introduced, which would return the
--EIO errno to the calling process and terminate further processing so that the
-filesystem is not corrupted further. The filesystem is not converted to
-read-only, and the problematic file's inode number is reported in the kernel
-log. The user can try to check/fix this file via online filecheck feature.
-
-Scope
-=====
-This effort is to check/fix small issues which may hinder day-to-day operations
-of a cluster filesystem by turning the filesystem read-only. The scope of
-checking/fixing is at the file level, initially for regular files and eventually
-to all files (including system files) of the filesystem.
-
-In case of directory to file links is incorrect, the directory inode is
-reported as erroneous.
-
-This feature is not suited for extravagant checks which involve dependency of
-other components of the filesystem, such as but not limited to, checking if the
-bits for file blocks in the allocation has been set. In case of such an error,
-the offline fsck should/would be recommended.
-
-Finally, such an operation/feature should not be automated lest the filesystem
-may end up with more damage than before the repair attempt. So, this has to
-be performed using user interaction and consent.
-
-User interface
-==============
-When there are errors in the OCFS2 filesystem, they are usually accompanied
-by the inode number which caused the error. This inode number would be the
-input to check/fix the file.
-
-There is a sysfs directory for each OCFS2 file system mounting:
-
-  /sys/fs/ocfs2/<devname>/filecheck
-
-Here, <devname> indicates the name of OCFS2 volume device which has been already
-mounted. The file above would accept inode numbers. This could be used to
-communicate with kernel space, tell which file(inode number) will be checked or
-fixed. Currently, three operations are supported, which includes checking
-inode, fixing inode and setting the size of result record history.
-
-1. If you want to know what error exactly happened to <inode> before fixing, do
-
-  # echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/check
-  # cat /sys/fs/ocfs2/<devname>/filecheck/check
-
-The output is like this:
-  INO		DONE	ERROR
-39502		1	GENERATION
-
-<INO> lists the inode numbers.
-<DONE> indicates whether the operation has been finished.
-<ERROR> says what kind of errors was found. For the detailed error numbers,
-please refer to the file linux/fs/ocfs2/filecheck.h.
-
-2. If you determine to fix this inode, do
-
-  # echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/fix
-  # cat /sys/fs/ocfs2/<devname>/filecheck/fix
-
-The output is like this:
-  INO		DONE	ERROR
-39502		1	SUCCESS
-
-This time, the <ERROR> column indicates whether this fix is successful or not.
-
-3. The record cache is used to store the history of check/fix results. It's
-default size is 10, and can be adjust between the range of 10 ~ 100. You can
-adjust the size like this:
-
-  # echo "<size>" > /sys/fs/ocfs2/<devname>/filecheck/set
-
-Fixing stuff
-============
-On receiving the inode, the filesystem would read the inode and the
-file metadata. In case of errors, the filesystem would fix the errors
-and report the problems it fixed in the kernel log. As a precautionary measure,
-the inode must first be checked for errors before performing a final fix.
-
-The inode and the result history will be maintained temporarily in a
-small linked list buffer which would contain the last (N) inodes
-fixed/checked, the detailed errors which were fixed/checked are printed in the
-kernel log.
-- 
cgit 


From fa95e087ff69468b4e452c50c3f4c59a45846b8d Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:15 +0100
Subject: docs: filesystems: convert ocfs2.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Link: https://lore.kernel.org/r/e29a8120bf1d847f23fb68e915f10a7d43bed9e3.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/ocfs2.rst | 117 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/ocfs2.txt | 106 --------------------------------
 3 files changed, 118 insertions(+), 106 deletions(-)
 create mode 100644 Documentation/filesystems/ocfs2.rst
 delete mode 100644 Documentation/filesystems/ocfs2.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index f3a26fdbd04f..3b2b07491c98 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -76,6 +76,7 @@ Documentation for filesystem implementations.
    nilfs2
    nfs/index
    ntfs
+   ocfs2
    ocfs2-online-filecheck
    overlayfs
    virtiofs
diff --git a/Documentation/filesystems/ocfs2.rst b/Documentation/filesystems/ocfs2.rst
new file mode 100644
index 000000000000..412386bc6506
--- /dev/null
+++ b/Documentation/filesystems/ocfs2.rst
@@ -0,0 +1,117 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+OCFS2 filesystem
+================
+
+OCFS2 is a general purpose extent based shared disk cluster file
+system with many similarities to ext3. It supports 64 bit inode
+numbers, and has automatically extending metadata groups which may
+also make it attractive for non-clustered use.
+
+You'll want to install the ocfs2-tools package in order to at least
+get "mount.ocfs2" and "ocfs2_hb_ctl".
+
+Project web page:    http://ocfs2.wiki.kernel.org
+Tools git tree:      https://github.com/markfasheh/ocfs2-tools
+OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
+
+All code copyright 2005 Oracle except when otherwise noted.
+
+Credits
+=======
+
+Lots of code taken from ext3 and other projects.
+
+Authors in alphabetical order:
+
+- Joel Becker   <joel.becker@oracle.com>
+- Zach Brown    <zach.brown@oracle.com>
+- Mark Fasheh   <mfasheh@suse.com>
+- Kurt Hackel   <kurt.hackel@oracle.com>
+- Tao Ma        <tao.ma@oracle.com>
+- Sunil Mushran <sunil.mushran@oracle.com>
+- Manish Singh  <manish.singh@oracle.com>
+- Tiger Yang    <tiger.yang@oracle.com>
+
+Caveats
+=======
+Features which OCFS2 does not support yet:
+
+	- Directory change notification (F_NOTIFY)
+	- Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease)
+
+Mount options
+=============
+
+OCFS2 supports the following mount options:
+
+(*) == default
+
+======================= ========================================================
+barrier=1		This enables/disables barriers. barrier=0 disables it,
+			barrier=1 enables it.
+errors=remount-ro(*)	Remount the filesystem read-only on an error.
+errors=panic		Panic and halt the machine if an error occurs.
+intr		(*)	Allow signals to interrupt cluster operations.
+nointr			Do not allow signals to interrupt cluster
+			operations.
+noatime			Do not update access time.
+relatime(*)		Update atime if the previous atime is older than
+			mtime or ctime
+strictatime		Always update atime, but the minimum update interval
+			is specified by atime_quantum.
+atime_quantum=60(*)	OCFS2 will not update atime unless this number
+			of seconds has passed since the last update.
+			Set to zero to always update atime. This option need
+			work with strictatime.
+data=ordered	(*)	All data are forced directly out to the main file
+			system prior to its metadata being committed to the
+			journal.
+data=writeback		Data ordering is not preserved, data may be written
+			into the main file system after its metadata has been
+			committed to the journal.
+preferred_slot=0(*)	During mount, try to use this filesystem slot first. If
+			it is in use by another node, the first empty one found
+			will be chosen. Invalid values will be ignored.
+commit=nrsec	(*)	Ocfs2 can be told to sync all its data and metadata
+			every 'nrsec' seconds. The default value is 5 seconds.
+			This means that if you lose your power, you will lose
+			as much as the latest 5 seconds of work (your
+			filesystem will not be damaged though, thanks to the
+			journaling).  This default value (or any low value)
+			will hurt performance, but it's good for data-safety.
+			Setting it to 0 will have the same effect as leaving
+			it at the default (5 seconds).
+			Setting it to very large values will improve
+			performance.
+localalloc=8(*)		Allows custom localalloc size in MB. If the value is too
+			large, the fs will silently revert it to the default.
+localflocks		This disables cluster aware flock.
+inode64			Indicates that Ocfs2 is allowed to create inodes at
+			any location in the filesystem, including those which
+			will result in inode numbers occupying more than 32
+			bits of significance.
+user_xattr	(*)	Enables Extended User Attributes.
+nouser_xattr		Disables Extended User Attributes.
+acl			Enables POSIX Access Control Lists support.
+noacl		(*)	Disables POSIX Access Control Lists support.
+resv_level=2	(*)	Set how aggressive allocation reservations will be.
+			Valid values are between 0 (reservations off) to 8
+			(maximum space for reservations).
+dir_resv_level=	(*)	By default, directory reservations will scale with file
+			reservations - users should rarely need to change this
+			value. If allocation reservations are turned off, this
+			option will have no effect.
+coherency=full  (*)	Disallow concurrent O_DIRECT writes, cluster inode
+			lock will be taken to force other nodes drop cache,
+			therefore full cluster coherency is guaranteed even
+			for O_DIRECT writes.
+coherency=buffered	Allow concurrent O_DIRECT writes without EX lock among
+			nodes, which gains high performance at risk of getting
+			stale data on other nodes.
+journal_async_commit	Commit block can be written to disk without waiting
+			for descriptor blocks. If enabled older kernels cannot
+			mount the device. This will enable 'journal_checksum'
+			internally.
+======================= ========================================================
diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt
deleted file mode 100644
index 4c49e5410595..000000000000
--- a/Documentation/filesystems/ocfs2.txt
+++ /dev/null
@@ -1,106 +0,0 @@
-OCFS2 filesystem
-==================
-OCFS2 is a general purpose extent based shared disk cluster file
-system with many similarities to ext3. It supports 64 bit inode
-numbers, and has automatically extending metadata groups which may
-also make it attractive for non-clustered use.
-
-You'll want to install the ocfs2-tools package in order to at least
-get "mount.ocfs2" and "ocfs2_hb_ctl".
-
-Project web page:    http://ocfs2.wiki.kernel.org
-Tools git tree:      https://github.com/markfasheh/ocfs2-tools
-OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/
-
-All code copyright 2005 Oracle except when otherwise noted.
-
-CREDITS:
-Lots of code taken from ext3 and other projects.
-
-Authors in alphabetical order:
-Joel Becker   <joel.becker@oracle.com>
-Zach Brown    <zach.brown@oracle.com>
-Mark Fasheh   <mfasheh@suse.com>
-Kurt Hackel   <kurt.hackel@oracle.com>
-Tao Ma        <tao.ma@oracle.com>
-Sunil Mushran <sunil.mushran@oracle.com>
-Manish Singh  <manish.singh@oracle.com>
-Tiger Yang    <tiger.yang@oracle.com>
-
-Caveats
-=======
-Features which OCFS2 does not support yet:
-	- Directory change notification (F_NOTIFY)
-	- Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease)
-
-Mount options
-=============
-
-OCFS2 supports the following mount options:
-(*) == default
-
-barrier=1		This enables/disables barriers. barrier=0 disables it,
-			barrier=1 enables it.
-errors=remount-ro(*)	Remount the filesystem read-only on an error.
-errors=panic		Panic and halt the machine if an error occurs.
-intr		(*)	Allow signals to interrupt cluster operations.
-nointr			Do not allow signals to interrupt cluster
-			operations.
-noatime			Do not update access time.
-relatime(*)		Update atime if the previous atime is older than
-			mtime or ctime
-strictatime		Always update atime, but the minimum update interval
-			is specified by atime_quantum.
-atime_quantum=60(*)	OCFS2 will not update atime unless this number
-			of seconds has passed since the last update.
-			Set to zero to always update atime. This option need
-			work with strictatime.
-data=ordered	(*)	All data are forced directly out to the main file
-			system prior to its metadata being committed to the
-			journal.
-data=writeback		Data ordering is not preserved, data may be written
-			into the main file system after its metadata has been
-			committed to the journal.
-preferred_slot=0(*)	During mount, try to use this filesystem slot first. If
-			it is in use by another node, the first empty one found
-			will be chosen. Invalid values will be ignored.
-commit=nrsec	(*)	Ocfs2 can be told to sync all its data and metadata
-			every 'nrsec' seconds. The default value is 5 seconds.
-			This means that if you lose your power, you will lose
-			as much as the latest 5 seconds of work (your
-			filesystem will not be damaged though, thanks to the
-			journaling).  This default value (or any low value)
-			will hurt performance, but it's good for data-safety.
-			Setting it to 0 will have the same effect as leaving
-			it at the default (5 seconds).
-			Setting it to very large values will improve
-			performance.
-localalloc=8(*)		Allows custom localalloc size in MB. If the value is too
-			large, the fs will silently revert it to the default.
-localflocks		This disables cluster aware flock.
-inode64			Indicates that Ocfs2 is allowed to create inodes at
-			any location in the filesystem, including those which
-			will result in inode numbers occupying more than 32
-			bits of significance.
-user_xattr	(*)	Enables Extended User Attributes.
-nouser_xattr		Disables Extended User Attributes.
-acl			Enables POSIX Access Control Lists support.
-noacl		(*)	Disables POSIX Access Control Lists support.
-resv_level=2	(*)	Set how aggressive allocation reservations will be.
-			Valid values are between 0 (reservations off) to 8
-			(maximum space for reservations).
-dir_resv_level=	(*)	By default, directory reservations will scale with file
-			reservations - users should rarely need to change this
-			value. If allocation reservations are turned off, this
-			option will have no effect.
-coherency=full  (*)	Disallow concurrent O_DIRECT writes, cluster inode
-			lock will be taken to force other nodes drop cache,
-			therefore full cluster coherency is guaranteed even
-			for O_DIRECT writes.
-coherency=buffered	Allow concurrent O_DIRECT writes without EX lock among
-			nodes, which gains high performance at risk of getting
-			stale data on other nodes.
-journal_async_commit	Commit block can be written to disk without waiting
-			for descriptor blocks. If enabled older kernels cannot
-			mount the device. This will enable 'journal_checksum'
-			internally.
-- 
cgit 


From 7cbb468f0c70878fe64d324790ee049c1881af7c Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:16 +0100
Subject: docs: filesystems: convert omfs.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Bob Copeland <me@bobcopeland.com>
Link: https://lore.kernel.org/r/0c125c7c971d81a557ca954992b8d770a9d1e3e8.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/omfs.rst  | 112 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/omfs.txt  | 106 ----------------------------------
 3 files changed, 113 insertions(+), 106 deletions(-)
 create mode 100644 Documentation/filesystems/omfs.rst
 delete mode 100644 Documentation/filesystems/omfs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 3b2b07491c98..fbee77175840 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -78,6 +78,7 @@ Documentation for filesystem implementations.
    ntfs
    ocfs2
    ocfs2-online-filecheck
+   omfs
    overlayfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/omfs.rst b/Documentation/filesystems/omfs.rst
new file mode 100644
index 000000000000..4c8bb3074169
--- /dev/null
+++ b/Documentation/filesystems/omfs.rst
@@ -0,0 +1,112 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================================
+Optimized MPEG Filesystem (OMFS)
+================================
+
+Overview
+========
+
+OMFS is a filesystem created by SonicBlue for use in the ReplayTV DVR
+and Rio Karma MP3 player.  The filesystem is extent-based, utilizing
+block sizes from 2k to 8k, with hash-based directories.  This
+filesystem driver may be used to read and write disks from these
+devices.
+
+Note, it is not recommended that this FS be used in place of a general
+filesystem for your own streaming media device.  Native Linux filesystems
+will likely perform better.
+
+More information is available at:
+
+    http://linux-karma.sf.net/
+
+Various utilities, including mkomfs and omfsck, are included with
+omfsprogs, available at:
+
+    http://bobcopeland.com/karma/
+
+Instructions are included in its README.
+
+Options
+=======
+
+OMFS supports the following mount-time options:
+
+    ============   ========================================
+    uid=n          make all files owned by specified user
+    gid=n          make all files owned by specified group
+    umask=xxx      set permission umask to xxx
+    fmask=xxx      set umask to xxx for files
+    dmask=xxx      set umask to xxx for directories
+    ============   ========================================
+
+Disk format
+===========
+
+OMFS discriminates between "sysblocks" and normal data blocks.  The sysblock
+group consists of super block information, file metadata, directory structures,
+and extents.  Each sysblock has a header containing CRCs of the entire
+sysblock, and may be mirrored in successive blocks on the disk.  A sysblock may
+have a smaller size than a data block, but since they are both addressed by the
+same 64-bit block number, any remaining space in the smaller sysblock is
+unused.
+
+Sysblock header information::
+
+    struct omfs_header {
+	    __be64 h_self;                  /* FS block where this is located */
+	    __be32 h_body_size;             /* size of useful data after header */
+	    __be16 h_crc;                   /* crc-ccitt of body_size bytes */
+	    char h_fill1[2];
+	    u8 h_version;                   /* version, always 1 */
+	    char h_type;                    /* OMFS_INODE_X */
+	    u8 h_magic;                     /* OMFS_IMAGIC */
+	    u8 h_check_xor;                 /* XOR of header bytes before this */
+	    __be32 h_fill2;
+    };
+
+Files and directories are both represented by omfs_inode::
+
+    struct omfs_inode {
+	    struct omfs_header i_head;      /* header */
+	    __be64 i_parent;                /* parent containing this inode */
+	    __be64 i_sibling;               /* next inode in hash bucket */
+	    __be64 i_ctime;                 /* ctime, in milliseconds */
+	    char i_fill1[35];
+	    char i_type;                    /* OMFS_[DIR,FILE] */
+	    __be32 i_fill2;
+	    char i_fill3[64];
+	    char i_name[OMFS_NAMELEN];      /* filename */
+	    __be64 i_size;                  /* size of file, in bytes */
+    };
+
+Directories in OMFS are implemented as a large hash table.  Filenames are
+hashed then prepended into the bucket list beginning at OMFS_DIR_START.
+Lookup requires hashing the filename, then seeking across i_sibling pointers
+until a match is found on i_name.  Empty buckets are represented by block
+pointers with all-1s (~0).
+
+A file is an omfs_inode structure followed by an extent table beginning at
+OMFS_EXTENT_START::
+
+    struct omfs_extent_entry {
+	    __be64 e_cluster;               /* start location of a set of blocks */
+	    __be64 e_blocks;                /* number of blocks after e_cluster */
+    };
+
+    struct omfs_extent {
+	    __be64 e_next;                  /* next extent table location */
+	    __be32 e_extent_count;          /* total # extents in this table */
+	    __be32 e_fill;
+	    struct omfs_extent_entry e_entry;       /* start of extent entries */
+    };
+
+Each extent holds the block offset followed by number of blocks allocated to
+the extent.  The final extent in each table is a terminator with e_cluster
+being ~0 and e_blocks being ones'-complement of the total number of blocks
+in the table.
+
+If this table overflows, a continuation inode is written and pointed to by
+e_next.  These have a header but lack the rest of the inode structure.
+
diff --git a/Documentation/filesystems/omfs.txt b/Documentation/filesystems/omfs.txt
deleted file mode 100644
index 1d0d41ff5c65..000000000000
--- a/Documentation/filesystems/omfs.txt
+++ /dev/null
@@ -1,106 +0,0 @@
-Optimized MPEG Filesystem (OMFS)
-
-Overview
-========
-
-OMFS is a filesystem created by SonicBlue for use in the ReplayTV DVR
-and Rio Karma MP3 player.  The filesystem is extent-based, utilizing
-block sizes from 2k to 8k, with hash-based directories.  This
-filesystem driver may be used to read and write disks from these
-devices.
-
-Note, it is not recommended that this FS be used in place of a general
-filesystem for your own streaming media device.  Native Linux filesystems
-will likely perform better.
-
-More information is available at:
-
-    http://linux-karma.sf.net/
-
-Various utilities, including mkomfs and omfsck, are included with
-omfsprogs, available at:
-
-    http://bobcopeland.com/karma/
-
-Instructions are included in its README.
-
-Options
-=======
-
-OMFS supports the following mount-time options:
-
-    uid=n        - make all files owned by specified user
-    gid=n        - make all files owned by specified group
-    umask=xxx    - set permission umask to xxx
-    fmask=xxx    - set umask to xxx for files
-    dmask=xxx    - set umask to xxx for directories
-
-Disk format
-===========
-
-OMFS discriminates between "sysblocks" and normal data blocks.  The sysblock
-group consists of super block information, file metadata, directory structures,
-and extents.  Each sysblock has a header containing CRCs of the entire
-sysblock, and may be mirrored in successive blocks on the disk.  A sysblock may
-have a smaller size than a data block, but since they are both addressed by the
-same 64-bit block number, any remaining space in the smaller sysblock is
-unused.
-
-Sysblock header information:
-
-struct omfs_header {
-        __be64 h_self;                  /* FS block where this is located */
-        __be32 h_body_size;             /* size of useful data after header */
-        __be16 h_crc;                   /* crc-ccitt of body_size bytes */
-        char h_fill1[2];
-        u8 h_version;                   /* version, always 1 */
-        char h_type;                    /* OMFS_INODE_X */
-        u8 h_magic;                     /* OMFS_IMAGIC */
-        u8 h_check_xor;                 /* XOR of header bytes before this */
-        __be32 h_fill2;
-};
-
-Files and directories are both represented by omfs_inode:
-
-struct omfs_inode {
-        struct omfs_header i_head;      /* header */
-        __be64 i_parent;                /* parent containing this inode */
-        __be64 i_sibling;               /* next inode in hash bucket */
-        __be64 i_ctime;                 /* ctime, in milliseconds */
-        char i_fill1[35];
-        char i_type;                    /* OMFS_[DIR,FILE] */
-        __be32 i_fill2;
-        char i_fill3[64];
-        char i_name[OMFS_NAMELEN];      /* filename */
-        __be64 i_size;                  /* size of file, in bytes */
-};
-
-Directories in OMFS are implemented as a large hash table.  Filenames are
-hashed then prepended into the bucket list beginning at OMFS_DIR_START.
-Lookup requires hashing the filename, then seeking across i_sibling pointers
-until a match is found on i_name.  Empty buckets are represented by block
-pointers with all-1s (~0).
-
-A file is an omfs_inode structure followed by an extent table beginning at
-OMFS_EXTENT_START:
-
-struct omfs_extent_entry {
-        __be64 e_cluster;               /* start location of a set of blocks */
-        __be64 e_blocks;                /* number of blocks after e_cluster */
-};
-
-struct omfs_extent {
-        __be64 e_next;                  /* next extent table location */
-        __be32 e_extent_count;          /* total # extents in this table */
-        __be32 e_fill;
-        struct omfs_extent_entry e_entry;       /* start of extent entries */
-};
-
-Each extent holds the block offset followed by number of blocks allocated to
-the extent.  The final extent in each table is a terminator with e_cluster
-being ~0 and e_blocks being ones'-complement of the total number of blocks
-in the table.
-
-If this table overflows, a continuation inode is written and pointed to by
-e_next.  These have a header but lack the rest of the inode structure.
-
-- 
cgit 


From 18ccb2233fc5f7c27b5be17f5b6585c2fa62d919 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:17 +0100
Subject: docs: filesystems: convert orangefs.txt to ReST

- Add a SPDX header;
- Adjust document and section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/6f438eeff5b029d229197a602bd9b74004fe9b63.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst    |   1 +
 Documentation/filesystems/orangefs.rst | 554 +++++++++++++++++++++++++++++++++
 Documentation/filesystems/orangefs.txt | 529 -------------------------------
 3 files changed, 555 insertions(+), 529 deletions(-)
 create mode 100644 Documentation/filesystems/orangefs.rst
 delete mode 100644 Documentation/filesystems/orangefs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index fbee77175840..fed53f831192 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -79,6 +79,7 @@ Documentation for filesystem implementations.
    ocfs2
    ocfs2-online-filecheck
    omfs
+   orangefs
    overlayfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/orangefs.rst b/Documentation/filesystems/orangefs.rst
new file mode 100644
index 000000000000..7d6d4cad73c4
--- /dev/null
+++ b/Documentation/filesystems/orangefs.rst
@@ -0,0 +1,554 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========
+ORANGEFS
+========
+
+OrangeFS is an LGPL userspace scale-out parallel storage system. It is ideal
+for large storage problems faced by HPC, BigData, Streaming Video,
+Genomics, Bioinformatics.
+
+Orangefs, originally called PVFS, was first developed in 1993 by
+Walt Ligon and Eric Blumer as a parallel file system for Parallel
+Virtual Machine (PVM) as part of a NASA grant to study the I/O patterns
+of parallel programs.
+
+Orangefs features include:
+
+  * Distributes file data among multiple file servers
+  * Supports simultaneous access by multiple clients
+  * Stores file data and metadata on servers using local file system
+    and access methods
+  * Userspace implementation is easy to install and maintain
+  * Direct MPI support
+  * Stateless
+
+
+Mailing List Archives
+=====================
+
+http://lists.orangefs.org/pipermail/devel_lists.orangefs.org/
+
+
+Mailing List Submissions
+========================
+
+devel@lists.orangefs.org
+
+
+Documentation
+=============
+
+http://www.orangefs.org/documentation/
+
+
+Userspace Filesystem Source
+===========================
+
+http://www.orangefs.org/download
+
+Orangefs versions prior to 2.9.3 would not be compatible with the
+upstream version of the kernel client.
+
+
+Running ORANGEFS On a Single Server
+===================================
+
+OrangeFS is usually run in large installations with multiple servers and
+clients, but a complete filesystem can be run on a single machine for
+development and testing.
+
+On Fedora, install orangefs and orangefs-server::
+
+    dnf -y install orangefs orangefs-server
+
+There is an example server configuration file in
+/etc/orangefs/orangefs.conf.  Change localhost to your hostname if
+necessary.
+
+To generate a filesystem to run xfstests against, see below.
+
+There is an example client configuration file in /etc/pvfs2tab.  It is a
+single line.  Uncomment it and change the hostname if necessary.  This
+controls clients which use libpvfs2.  This does not control the
+pvfs2-client-core.
+
+Create the filesystem::
+
+    pvfs2-server -f /etc/orangefs/orangefs.conf
+
+Start the server::
+
+    systemctl start orangefs-server
+
+Test the server::
+
+    pvfs2-ping -m /pvfsmnt
+
+Start the client.  The module must be compiled in or loaded before this
+point::
+
+    systemctl start orangefs-client
+
+Mount the filesystem::
+
+    mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
+
+
+Building ORANGEFS on a Single Server
+====================================
+
+Where OrangeFS cannot be installed from distribution packages, it may be
+built from source.
+
+You can omit --prefix if you don't care that things are sprinkled around
+in /usr/local.  As of version 2.9.6, OrangeFS uses Berkeley DB by
+default, we will probably be changing the default to LMDB soon.
+
+::
+
+    ./configure --prefix=/opt/ofs --with-db-backend=lmdb
+
+    make
+
+    make install
+
+Create an orangefs config file::
+
+    /opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
+
+Create an /etc/pvfs2tab file::
+
+    echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
+	/etc/pvfs2tab
+
+Create the mount point you specified in the tab file if needed::
+
+    mkdir /pvfsmnt
+
+Bootstrap the server::
+
+    /opt/ofs/sbin/pvfs2-server -f /etc/pvfs2.conf
+
+Start the server::
+
+    /opt/osf/sbin/pvfs2-server /etc/pvfs2.conf
+
+Now the server should be running. Pvfs2-ls is a simple
+test to verify that the server is running::
+
+    /opt/ofs/bin/pvfs2-ls /pvfsmnt
+
+If stuff seems to be working, load the kernel module and
+turn on the client core::
+
+    /opt/ofs/sbin/pvfs2-client -p /opt/osf/sbin/pvfs2-client-core
+
+Mount your filesystem::
+
+    mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
+
+
+Running xfstests
+================
+
+It is useful to use a scratch filesystem with xfstests.  This can be
+done with only one server.
+
+Make a second copy of the FileSystem section in the server configuration
+file, which is /etc/orangefs/orangefs.conf.  Change the Name to scratch.
+Change the ID to something other than the ID of the first FileSystem
+section (2 is usually a good choice).
+
+Then there are two FileSystem sections: orangefs and scratch.
+
+This change should be made before creating the filesystem.
+
+::
+
+    pvfs2-server -f /etc/orangefs/orangefs.conf
+
+To run xfstests, create /etc/xfsqa.config::
+
+    TEST_DIR=/orangefs
+    TEST_DEV=tcp://localhost:3334/orangefs
+    SCRATCH_MNT=/scratch
+    SCRATCH_DEV=tcp://localhost:3334/scratch
+
+Then xfstests can be run::
+
+    ./check -pvfs2
+
+
+Options
+=======
+
+The following mount options are accepted:
+
+  acl
+    Allow the use of Access Control Lists on files and directories.
+
+  intr
+    Some operations between the kernel client and the user space
+    filesystem can be interruptible, such as changes in debug levels
+    and the setting of tunable parameters.
+
+  local_lock
+    Enable posix locking from the perspective of "this" kernel. The
+    default file_operations lock action is to return ENOSYS. Posix
+    locking kicks in if the filesystem is mounted with -o local_lock.
+    Distributed locking is being worked on for the future.
+
+
+Debugging
+=========
+
+If you want the debug (GOSSIP) statements in a particular
+source file (inode.c for example) go to syslog::
+
+  echo inode > /sys/kernel/debug/orangefs/kernel-debug
+
+No debugging (the default)::
+
+  echo none > /sys/kernel/debug/orangefs/kernel-debug
+
+Debugging from several source files::
+
+  echo inode,dir > /sys/kernel/debug/orangefs/kernel-debug
+
+All debugging::
+
+  echo all > /sys/kernel/debug/orangefs/kernel-debug
+
+Get a list of all debugging keywords::
+
+  cat /sys/kernel/debug/orangefs/debug-help
+
+
+Protocol between Kernel Module and Userspace
+============================================
+
+Orangefs is a user space filesystem and an associated kernel module.
+We'll just refer to the user space part of Orangefs as "userspace"
+from here on out. Orangefs descends from PVFS, and userspace code
+still uses PVFS for function and variable names. Userspace typedefs
+many of the important structures. Function and variable names in
+the kernel module have been transitioned to "orangefs", and The Linux
+Coding Style avoids typedefs, so kernel module structures that
+correspond to userspace structures are not typedefed.
+
+The kernel module implements a pseudo device that userspace
+can read from and write to. Userspace can also manipulate the
+kernel module through the pseudo device with ioctl.
+
+The Bufmap
+----------
+
+At startup userspace allocates two page-size-aligned (posix_memalign)
+mlocked memory buffers, one is used for IO and one is used for readdir
+operations. The IO buffer is 41943040 bytes and the readdir buffer is
+4194304 bytes. Each buffer contains logical chunks, or partitions, and
+a pointer to each buffer is added to its own PVFS_dev_map_desc structure
+which also describes its total size, as well as the size and number of
+the partitions.
+
+A pointer to the IO buffer's PVFS_dev_map_desc structure is sent to a
+mapping routine in the kernel module with an ioctl. The structure is
+copied from user space to kernel space with copy_from_user and is used
+to initialize the kernel module's "bufmap" (struct orangefs_bufmap), which
+then contains:
+
+  * refcnt
+    - a reference counter
+  * desc_size - PVFS2_BUFMAP_DEFAULT_DESC_SIZE (4194304) - the IO buffer's
+    partition size, which represents the filesystem's block size and
+    is used for s_blocksize in super blocks.
+  * desc_count - PVFS2_BUFMAP_DEFAULT_DESC_COUNT (10) - the number of
+    partitions in the IO buffer.
+  * desc_shift - log2(desc_size), used for s_blocksize_bits in super blocks.
+  * total_size - the total size of the IO buffer.
+  * page_count - the number of 4096 byte pages in the IO buffer.
+  * page_array - a pointer to ``page_count * (sizeof(struct page*))`` bytes
+    of kcalloced memory. This memory is used as an array of pointers
+    to each of the pages in the IO buffer through a call to get_user_pages.
+  * desc_array - a pointer to ``desc_count * (sizeof(struct orangefs_bufmap_desc))``
+    bytes of kcalloced memory. This memory is further intialized:
+
+      user_desc is the kernel's copy of the IO buffer's ORANGEFS_dev_map_desc
+      structure. user_desc->ptr points to the IO buffer.
+
+      ::
+
+	pages_per_desc = bufmap->desc_size / PAGE_SIZE
+	offset = 0
+
+        bufmap->desc_array[0].page_array = &bufmap->page_array[offset]
+        bufmap->desc_array[0].array_count = pages_per_desc = 1024
+        bufmap->desc_array[0].uaddr = (user_desc->ptr) + (0 * 1024 * 4096)
+        offset += 1024
+                           .
+                           .
+                           .
+        bufmap->desc_array[9].page_array = &bufmap->page_array[offset]
+        bufmap->desc_array[9].array_count = pages_per_desc = 1024
+        bufmap->desc_array[9].uaddr = (user_desc->ptr) +
+                                               (9 * 1024 * 4096)
+        offset += 1024
+
+  * buffer_index_array - a desc_count sized array of ints, used to
+    indicate which of the IO buffer's partitions are available to use.
+  * buffer_index_lock - a spinlock to protect buffer_index_array during update.
+  * readdir_index_array - a five (ORANGEFS_READDIR_DEFAULT_DESC_COUNT) element
+    int array used to indicate which of the readdir buffer's partitions are
+    available to use.
+  * readdir_index_lock - a spinlock to protect readdir_index_array during
+    update.
+
+Operations
+----------
+
+The kernel module builds an "op" (struct orangefs_kernel_op_s) when it
+needs to communicate with userspace. Part of the op contains the "upcall"
+which expresses the request to userspace. Part of the op eventually
+contains the "downcall" which expresses the results of the request.
+
+The slab allocator is used to keep a cache of op structures handy.
+
+At init time the kernel module defines and initializes a request list
+and an in_progress hash table to keep track of all the ops that are
+in flight at any given time.
+
+Ops are stateful:
+
+ * unknown
+	    - op was just initialized
+ * waiting
+	    - op is on request_list (upward bound)
+ * inprogr
+	    - op is in progress (waiting for downcall)
+ * serviced
+	    - op has matching downcall; ok
+ * purged
+	    - op has to start a timer since client-core
+              exited uncleanly before servicing op
+ * given up
+	    - submitter has given up waiting for it
+
+When some arbitrary userspace program needs to perform a
+filesystem operation on Orangefs (readdir, I/O, create, whatever)
+an op structure is initialized and tagged with a distinguishing ID
+number. The upcall part of the op is filled out, and the op is
+passed to the "service_operation" function.
+
+Service_operation changes the op's state to "waiting", puts
+it on the request list, and signals the Orangefs file_operations.poll
+function through a wait queue. Userspace is polling the pseudo-device
+and thus becomes aware of the upcall request that needs to be read.
+
+When the Orangefs file_operations.read function is triggered, the
+request list is searched for an op that seems ready-to-process.
+The op is removed from the request list. The tag from the op and
+the filled-out upcall struct are copy_to_user'ed back to userspace.
+
+If any of these (and some additional protocol) copy_to_users fail,
+the op's state is set to "waiting" and the op is added back to
+the request list. Otherwise, the op's state is changed to "in progress",
+and the op is hashed on its tag and put onto the end of a list in the
+in_progress hash table at the index the tag hashed to.
+
+When userspace has assembled the response to the upcall, it
+writes the response, which includes the distinguishing tag, back to
+the pseudo device in a series of io_vecs. This triggers the Orangefs
+file_operations.write_iter function to find the op with the associated
+tag and remove it from the in_progress hash table. As long as the op's
+state is not "canceled" or "given up", its state is set to "serviced".
+The file_operations.write_iter function returns to the waiting vfs,
+and back to service_operation through wait_for_matching_downcall.
+
+Service operation returns to its caller with the op's downcall
+part (the response to the upcall) filled out.
+
+The "client-core" is the bridge between the kernel module and
+userspace. The client-core is a daemon. The client-core has an
+associated watchdog daemon. If the client-core is ever signaled
+to die, the watchdog daemon restarts the client-core. Even though
+the client-core is restarted "right away", there is a period of
+time during such an event that the client-core is dead. A dead client-core
+can't be triggered by the Orangefs file_operations.poll function.
+Ops that pass through service_operation during a "dead spell" can timeout
+on the wait queue and one attempt is made to recycle them. Obviously,
+if the client-core stays dead too long, the arbitrary userspace processes
+trying to use Orangefs will be negatively affected. Waiting ops
+that can't be serviced will be removed from the request list and
+have their states set to "given up". In-progress ops that can't
+be serviced will be removed from the in_progress hash table and
+have their states set to "given up".
+
+Readdir and I/O ops are atypical with respect to their payloads.
+
+  - readdir ops use the smaller of the two pre-allocated pre-partitioned
+    memory buffers. The readdir buffer is only available to userspace.
+    The kernel module obtains an index to a free partition before launching
+    a readdir op. Userspace deposits the results into the indexed partition
+    and then writes them to back to the pvfs device.
+
+  - io (read and write) ops use the larger of the two pre-allocated
+    pre-partitioned memory buffers. The IO buffer is accessible from
+    both userspace and the kernel module. The kernel module obtains an
+    index to a free partition before launching an io op. The kernel module
+    deposits write data into the indexed partition, to be consumed
+    directly by userspace. Userspace deposits the results of read
+    requests into the indexed partition, to be consumed directly
+    by the kernel module.
+
+Responses to kernel requests are all packaged in pvfs2_downcall_t
+structs. Besides a few other members, pvfs2_downcall_t contains a
+union of structs, each of which is associated with a particular
+response type.
+
+The several members outside of the union are:
+
+ ``int32_t type``
+    - type of operation.
+ ``int32_t status``
+    - return code for the operation.
+ ``int64_t trailer_size``
+    - 0 unless readdir operation.
+ ``char *trailer_buf``
+    - initialized to NULL, used during readdir operations.
+
+The appropriate member inside the union is filled out for any
+particular response.
+
+  PVFS2_VFS_OP_FILE_IO
+    fill a pvfs2_io_response_t
+
+  PVFS2_VFS_OP_LOOKUP
+    fill a PVFS_object_kref
+
+  PVFS2_VFS_OP_CREATE
+    fill a PVFS_object_kref
+
+  PVFS2_VFS_OP_SYMLINK
+    fill a PVFS_object_kref
+
+  PVFS2_VFS_OP_GETATTR
+    fill in a PVFS_sys_attr_s (tons of stuff the kernel doesn't need)
+    fill in a string with the link target when the object is a symlink.
+
+  PVFS2_VFS_OP_MKDIR
+    fill a PVFS_object_kref
+
+  PVFS2_VFS_OP_STATFS
+    fill a pvfs2_statfs_response_t with useless info <g>. It is hard for
+    us to know, in a timely fashion, these statistics about our
+    distributed network filesystem.
+
+  PVFS2_VFS_OP_FS_MOUNT
+    fill a pvfs2_fs_mount_response_t which is just like a PVFS_object_kref
+    except its members are in a different order and "__pad1" is replaced
+    with "id".
+
+  PVFS2_VFS_OP_GETXATTR
+    fill a pvfs2_getxattr_response_t
+
+  PVFS2_VFS_OP_LISTXATTR
+    fill a pvfs2_listxattr_response_t
+
+  PVFS2_VFS_OP_PARAM
+    fill a pvfs2_param_response_t
+
+  PVFS2_VFS_OP_PERF_COUNT
+    fill a pvfs2_perf_count_response_t
+
+  PVFS2_VFS_OP_FSKEY
+    file a pvfs2_fs_key_response_t
+
+  PVFS2_VFS_OP_READDIR
+    jamb everything needed to represent a pvfs2_readdir_response_t into
+    the readdir buffer descriptor specified in the upcall.
+
+Userspace uses writev() on /dev/pvfs2-req to pass responses to the requests
+made by the kernel side.
+
+A buffer_list containing:
+
+  - a pointer to the prepared response to the request from the
+    kernel (struct pvfs2_downcall_t).
+  - and also, in the case of a readdir request, a pointer to a
+    buffer containing descriptors for the objects in the target
+    directory.
+
+... is sent to the function (PINT_dev_write_list) which performs
+the writev.
+
+PINT_dev_write_list has a local iovec array: struct iovec io_array[10];
+
+The first four elements of io_array are initialized like this for all
+responses::
+
+  io_array[0].iov_base = address of local variable "proto_ver" (int32_t)
+  io_array[0].iov_len = sizeof(int32_t)
+
+  io_array[1].iov_base = address of global variable "pdev_magic" (int32_t)
+  io_array[1].iov_len = sizeof(int32_t)
+
+  io_array[2].iov_base = address of parameter "tag" (PVFS_id_gen_t)
+  io_array[2].iov_len = sizeof(int64_t)
+
+  io_array[3].iov_base = address of out_downcall member (pvfs2_downcall_t)
+                         of global variable vfs_request (vfs_request_t)
+  io_array[3].iov_len = sizeof(pvfs2_downcall_t)
+
+Readdir responses initialize the fifth element io_array like this::
+
+  io_array[4].iov_base = contents of member trailer_buf (char *)
+                         from out_downcall member of global variable
+                         vfs_request
+  io_array[4].iov_len = contents of member trailer_size (PVFS_size)
+                        from out_downcall member of global variable
+                        vfs_request
+
+Orangefs exploits the dcache in order to avoid sending redundant
+requests to userspace. We keep object inode attributes up-to-date with
+orangefs_inode_getattr. Orangefs_inode_getattr uses two arguments to
+help it decide whether or not to update an inode: "new" and "bypass".
+Orangefs keeps private data in an object's inode that includes a short
+timeout value, getattr_time, which allows any iteration of
+orangefs_inode_getattr to know how long it has been since the inode was
+updated. When the object is not new (new == 0) and the bypass flag is not
+set (bypass == 0) orangefs_inode_getattr returns without updating the inode
+if getattr_time has not timed out. Getattr_time is updated each time the
+inode is updated.
+
+Creation of a new object (file, dir, sym-link) includes the evaluation of
+its pathname, resulting in a negative directory entry for the object.
+A new inode is allocated and associated with the dentry, turning it from
+a negative dentry into a "productive full member of society". Orangefs
+obtains the new inode from Linux with new_inode() and associates
+the inode with the dentry by sending the pair back to Linux with
+d_instantiate().
+
+The evaluation of a pathname for an object resolves to its corresponding
+dentry. If there is no corresponding dentry, one is created for it in
+the dcache. Whenever a dentry is modified or verified Orangefs stores a
+short timeout value in the dentry's d_time, and the dentry will be trusted
+for that amount of time. Orangefs is a network filesystem, and objects
+can potentially change out-of-band with any particular Orangefs kernel module
+instance, so trusting a dentry is risky. The alternative to trusting
+dentries is to always obtain the needed information from userspace - at
+least a trip to the client-core, maybe to the servers. Obtaining information
+from a dentry is cheap, obtaining it from userspace is relatively expensive,
+hence the motivation to use the dentry when possible.
+
+The timeout values d_time and getattr_time are jiffy based, and the
+code is designed to avoid the jiffy-wrap problem::
+
+    "In general, if the clock may have wrapped around more than once, there
+    is no way to tell how much time has elapsed. However, if the times t1
+    and t2 are known to be fairly close, we can reliably compute the
+    difference in a way that takes into account the possibility that the
+    clock may have wrapped between times."
+
+from course notes by instructor Andy Wang
+
diff --git a/Documentation/filesystems/orangefs.txt b/Documentation/filesystems/orangefs.txt
deleted file mode 100644
index f4ba94950e3f..000000000000
--- a/Documentation/filesystems/orangefs.txt
+++ /dev/null
@@ -1,529 +0,0 @@
-ORANGEFS
-========
-
-OrangeFS is an LGPL userspace scale-out parallel storage system. It is ideal
-for large storage problems faced by HPC, BigData, Streaming Video,
-Genomics, Bioinformatics.
-
-Orangefs, originally called PVFS, was first developed in 1993 by
-Walt Ligon and Eric Blumer as a parallel file system for Parallel
-Virtual Machine (PVM) as part of a NASA grant to study the I/O patterns
-of parallel programs.
-
-Orangefs features include:
-
-  * Distributes file data among multiple file servers
-  * Supports simultaneous access by multiple clients
-  * Stores file data and metadata on servers using local file system
-    and access methods
-  * Userspace implementation is easy to install and maintain
-  * Direct MPI support
-  * Stateless
-
-
-MAILING LIST ARCHIVES
-=====================
-
-http://lists.orangefs.org/pipermail/devel_lists.orangefs.org/
-
-
-MAILING LIST SUBMISSIONS
-========================
-
-devel@lists.orangefs.org
-
-
-DOCUMENTATION
-=============
-
-http://www.orangefs.org/documentation/
-
-
-USERSPACE FILESYSTEM SOURCE
-===========================
-
-http://www.orangefs.org/download
-
-Orangefs versions prior to 2.9.3 would not be compatible with the
-upstream version of the kernel client.
-
-
-RUNNING ORANGEFS ON A SINGLE SERVER
-===================================
-
-OrangeFS is usually run in large installations with multiple servers and
-clients, but a complete filesystem can be run on a single machine for
-development and testing.
-
-On Fedora, install orangefs and orangefs-server.
-
-dnf -y install orangefs orangefs-server
-
-There is an example server configuration file in
-/etc/orangefs/orangefs.conf.  Change localhost to your hostname if
-necessary.
-
-To generate a filesystem to run xfstests against, see below.
-
-There is an example client configuration file in /etc/pvfs2tab.  It is a
-single line.  Uncomment it and change the hostname if necessary.  This
-controls clients which use libpvfs2.  This does not control the
-pvfs2-client-core.
-
-Create the filesystem.
-
-pvfs2-server -f /etc/orangefs/orangefs.conf
-
-Start the server.
-
-systemctl start orangefs-server
-
-Test the server.
-
-pvfs2-ping -m /pvfsmnt
-
-Start the client.  The module must be compiled in or loaded before this
-point.
-
-systemctl start orangefs-client
-
-Mount the filesystem.
-
-mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
-
-
-BUILDING ORANGEFS ON A SINGLE SERVER
-====================================
-
-Where OrangeFS cannot be installed from distribution packages, it may be
-built from source.
-
-You can omit --prefix if you don't care that things are sprinkled around
-in /usr/local.  As of version 2.9.6, OrangeFS uses Berkeley DB by
-default, we will probably be changing the default to LMDB soon.
-
-./configure --prefix=/opt/ofs --with-db-backend=lmdb
-
-make
-
-make install
-
-Create an orangefs config file.
-
-/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
-
-Create an /etc/pvfs2tab file.
-
-echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
-    /etc/pvfs2tab
-
-Create the mount point you specified in the tab file if needed.
-
-mkdir /pvfsmnt
-
-Bootstrap the server.
-
-/opt/ofs/sbin/pvfs2-server -f /etc/pvfs2.conf
-
-Start the server.
-
-/opt/osf/sbin/pvfs2-server /etc/pvfs2.conf
-
-Now the server should be running. Pvfs2-ls is a simple
-test to verify that the server is running.
-
-/opt/ofs/bin/pvfs2-ls /pvfsmnt
-
-If stuff seems to be working, load the kernel module and
-turn on the client core.
-
-/opt/ofs/sbin/pvfs2-client -p /opt/osf/sbin/pvfs2-client-core
-
-Mount your filesystem.
-
-mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
-
-
-RUNNING XFSTESTS
-================
-
-It is useful to use a scratch filesystem with xfstests.  This can be
-done with only one server.
-
-Make a second copy of the FileSystem section in the server configuration
-file, which is /etc/orangefs/orangefs.conf.  Change the Name to scratch.
-Change the ID to something other than the ID of the first FileSystem
-section (2 is usually a good choice).
-
-Then there are two FileSystem sections: orangefs and scratch.
-
-This change should be made before creating the filesystem.
-
-pvfs2-server -f /etc/orangefs/orangefs.conf
-
-To run xfstests, create /etc/xfsqa.config.
-
-TEST_DIR=/orangefs
-TEST_DEV=tcp://localhost:3334/orangefs
-SCRATCH_MNT=/scratch
-SCRATCH_DEV=tcp://localhost:3334/scratch
-
-Then xfstests can be run
-
-./check -pvfs2
-
-
-OPTIONS
-=======
-
-The following mount options are accepted:
-
-  acl
-    Allow the use of Access Control Lists on files and directories.
-
-  intr
-    Some operations between the kernel client and the user space
-    filesystem can be interruptible, such as changes in debug levels
-    and the setting of tunable parameters.
-
-  local_lock
-    Enable posix locking from the perspective of "this" kernel. The
-    default file_operations lock action is to return ENOSYS. Posix
-    locking kicks in if the filesystem is mounted with -o local_lock.
-    Distributed locking is being worked on for the future.
-
-
-DEBUGGING
-=========
-
-If you want the debug (GOSSIP) statements in a particular
-source file (inode.c for example) go to syslog:
-
-  echo inode > /sys/kernel/debug/orangefs/kernel-debug
-
-No debugging (the default):
-
-  echo none > /sys/kernel/debug/orangefs/kernel-debug
-
-Debugging from several source files:
-
-  echo inode,dir > /sys/kernel/debug/orangefs/kernel-debug
-
-All debugging:
-
-  echo all > /sys/kernel/debug/orangefs/kernel-debug
-
-Get a list of all debugging keywords:
-
-  cat /sys/kernel/debug/orangefs/debug-help
-
-
-PROTOCOL BETWEEN KERNEL MODULE AND USERSPACE
-============================================
-
-Orangefs is a user space filesystem and an associated kernel module.
-We'll just refer to the user space part of Orangefs as "userspace"
-from here on out. Orangefs descends from PVFS, and userspace code
-still uses PVFS for function and variable names. Userspace typedefs
-many of the important structures. Function and variable names in
-the kernel module have been transitioned to "orangefs", and The Linux
-Coding Style avoids typedefs, so kernel module structures that
-correspond to userspace structures are not typedefed.
-
-The kernel module implements a pseudo device that userspace
-can read from and write to. Userspace can also manipulate the
-kernel module through the pseudo device with ioctl.
-
-THE BUFMAP:
-
-At startup userspace allocates two page-size-aligned (posix_memalign)
-mlocked memory buffers, one is used for IO and one is used for readdir
-operations. The IO buffer is 41943040 bytes and the readdir buffer is
-4194304 bytes. Each buffer contains logical chunks, or partitions, and
-a pointer to each buffer is added to its own PVFS_dev_map_desc structure
-which also describes its total size, as well as the size and number of
-the partitions.
-
-A pointer to the IO buffer's PVFS_dev_map_desc structure is sent to a
-mapping routine in the kernel module with an ioctl. The structure is
-copied from user space to kernel space with copy_from_user and is used
-to initialize the kernel module's "bufmap" (struct orangefs_bufmap), which
-then contains:
-
-  * refcnt - a reference counter
-  * desc_size - PVFS2_BUFMAP_DEFAULT_DESC_SIZE (4194304) - the IO buffer's
-    partition size, which represents the filesystem's block size and
-    is used for s_blocksize in super blocks.
-  * desc_count - PVFS2_BUFMAP_DEFAULT_DESC_COUNT (10) - the number of
-    partitions in the IO buffer.
-  * desc_shift - log2(desc_size), used for s_blocksize_bits in super blocks.
-  * total_size - the total size of the IO buffer.
-  * page_count - the number of 4096 byte pages in the IO buffer.
-  * page_array - a pointer to page_count * (sizeof(struct page*)) bytes
-    of kcalloced memory. This memory is used as an array of pointers
-    to each of the pages in the IO buffer through a call to get_user_pages.
-  * desc_array - a pointer to desc_count * (sizeof(struct orangefs_bufmap_desc))
-    bytes of kcalloced memory. This memory is further intialized:
-
-      user_desc is the kernel's copy of the IO buffer's ORANGEFS_dev_map_desc
-      structure. user_desc->ptr points to the IO buffer.
-
-      pages_per_desc = bufmap->desc_size / PAGE_SIZE
-      offset = 0
-
-        bufmap->desc_array[0].page_array = &bufmap->page_array[offset]
-        bufmap->desc_array[0].array_count = pages_per_desc = 1024
-        bufmap->desc_array[0].uaddr = (user_desc->ptr) + (0 * 1024 * 4096)
-        offset += 1024
-                           .
-                           .
-                           .
-        bufmap->desc_array[9].page_array = &bufmap->page_array[offset]
-        bufmap->desc_array[9].array_count = pages_per_desc = 1024
-        bufmap->desc_array[9].uaddr = (user_desc->ptr) +
-                                               (9 * 1024 * 4096)
-        offset += 1024
-
-  * buffer_index_array - a desc_count sized array of ints, used to
-    indicate which of the IO buffer's partitions are available to use.
-  * buffer_index_lock - a spinlock to protect buffer_index_array during update.
-  * readdir_index_array - a five (ORANGEFS_READDIR_DEFAULT_DESC_COUNT) element
-    int array used to indicate which of the readdir buffer's partitions are
-    available to use.
-  * readdir_index_lock - a spinlock to protect readdir_index_array during
-    update.
-
-OPERATIONS:
-
-The kernel module builds an "op" (struct orangefs_kernel_op_s) when it
-needs to communicate with userspace. Part of the op contains the "upcall"
-which expresses the request to userspace. Part of the op eventually
-contains the "downcall" which expresses the results of the request.
-
-The slab allocator is used to keep a cache of op structures handy.
-
-At init time the kernel module defines and initializes a request list
-and an in_progress hash table to keep track of all the ops that are
-in flight at any given time.
-
-Ops are stateful:
-
- * unknown  - op was just initialized
- * waiting  - op is on request_list (upward bound)
- * inprogr  - op is in progress (waiting for downcall)
- * serviced - op has matching downcall; ok
- * purged   - op has to start a timer since client-core
-              exited uncleanly before servicing op
- * given up - submitter has given up waiting for it
-
-When some arbitrary userspace program needs to perform a
-filesystem operation on Orangefs (readdir, I/O, create, whatever)
-an op structure is initialized and tagged with a distinguishing ID
-number. The upcall part of the op is filled out, and the op is
-passed to the "service_operation" function.
-
-Service_operation changes the op's state to "waiting", puts
-it on the request list, and signals the Orangefs file_operations.poll
-function through a wait queue. Userspace is polling the pseudo-device
-and thus becomes aware of the upcall request that needs to be read.
-
-When the Orangefs file_operations.read function is triggered, the
-request list is searched for an op that seems ready-to-process.
-The op is removed from the request list. The tag from the op and
-the filled-out upcall struct are copy_to_user'ed back to userspace.
-
-If any of these (and some additional protocol) copy_to_users fail,
-the op's state is set to "waiting" and the op is added back to
-the request list. Otherwise, the op's state is changed to "in progress",
-and the op is hashed on its tag and put onto the end of a list in the
-in_progress hash table at the index the tag hashed to.
-
-When userspace has assembled the response to the upcall, it
-writes the response, which includes the distinguishing tag, back to
-the pseudo device in a series of io_vecs. This triggers the Orangefs
-file_operations.write_iter function to find the op with the associated
-tag and remove it from the in_progress hash table. As long as the op's
-state is not "canceled" or "given up", its state is set to "serviced".
-The file_operations.write_iter function returns to the waiting vfs,
-and back to service_operation through wait_for_matching_downcall.
-
-Service operation returns to its caller with the op's downcall
-part (the response to the upcall) filled out.
-
-The "client-core" is the bridge between the kernel module and
-userspace. The client-core is a daemon. The client-core has an
-associated watchdog daemon. If the client-core is ever signaled
-to die, the watchdog daemon restarts the client-core. Even though
-the client-core is restarted "right away", there is a period of
-time during such an event that the client-core is dead. A dead client-core
-can't be triggered by the Orangefs file_operations.poll function.
-Ops that pass through service_operation during a "dead spell" can timeout
-on the wait queue and one attempt is made to recycle them. Obviously,
-if the client-core stays dead too long, the arbitrary userspace processes
-trying to use Orangefs will be negatively affected. Waiting ops
-that can't be serviced will be removed from the request list and
-have their states set to "given up". In-progress ops that can't
-be serviced will be removed from the in_progress hash table and
-have their states set to "given up".
-
-Readdir and I/O ops are atypical with respect to their payloads.
-
-  - readdir ops use the smaller of the two pre-allocated pre-partitioned
-    memory buffers. The readdir buffer is only available to userspace.
-    The kernel module obtains an index to a free partition before launching
-    a readdir op. Userspace deposits the results into the indexed partition
-    and then writes them to back to the pvfs device.
-
-  - io (read and write) ops use the larger of the two pre-allocated
-    pre-partitioned memory buffers. The IO buffer is accessible from
-    both userspace and the kernel module. The kernel module obtains an
-    index to a free partition before launching an io op. The kernel module
-    deposits write data into the indexed partition, to be consumed
-    directly by userspace. Userspace deposits the results of read
-    requests into the indexed partition, to be consumed directly
-    by the kernel module.
-
-Responses to kernel requests are all packaged in pvfs2_downcall_t
-structs. Besides a few other members, pvfs2_downcall_t contains a
-union of structs, each of which is associated with a particular
-response type.
-
-The several members outside of the union are:
- - int32_t type - type of operation.
- - int32_t status - return code for the operation.
- - int64_t trailer_size - 0 unless readdir operation.
- - char *trailer_buf - initialized to NULL, used during readdir operations.
-
-The appropriate member inside the union is filled out for any
-particular response.
-
-  PVFS2_VFS_OP_FILE_IO
-    fill a pvfs2_io_response_t
-
-  PVFS2_VFS_OP_LOOKUP
-    fill a PVFS_object_kref
-
-  PVFS2_VFS_OP_CREATE
-    fill a PVFS_object_kref
-
-  PVFS2_VFS_OP_SYMLINK
-    fill a PVFS_object_kref
-
-  PVFS2_VFS_OP_GETATTR
-    fill in a PVFS_sys_attr_s (tons of stuff the kernel doesn't need)
-    fill in a string with the link target when the object is a symlink.
-
-  PVFS2_VFS_OP_MKDIR
-    fill a PVFS_object_kref
-
-  PVFS2_VFS_OP_STATFS
-    fill a pvfs2_statfs_response_t with useless info <g>. It is hard for
-    us to know, in a timely fashion, these statistics about our
-    distributed network filesystem.
-
-  PVFS2_VFS_OP_FS_MOUNT
-    fill a pvfs2_fs_mount_response_t which is just like a PVFS_object_kref
-    except its members are in a different order and "__pad1" is replaced
-    with "id".
-
-  PVFS2_VFS_OP_GETXATTR
-    fill a pvfs2_getxattr_response_t
-
-  PVFS2_VFS_OP_LISTXATTR
-    fill a pvfs2_listxattr_response_t
-
-  PVFS2_VFS_OP_PARAM
-    fill a pvfs2_param_response_t
-
-  PVFS2_VFS_OP_PERF_COUNT
-    fill a pvfs2_perf_count_response_t
-
-  PVFS2_VFS_OP_FSKEY
-    file a pvfs2_fs_key_response_t
-
-  PVFS2_VFS_OP_READDIR
-    jamb everything needed to represent a pvfs2_readdir_response_t into
-    the readdir buffer descriptor specified in the upcall.
-
-Userspace uses writev() on /dev/pvfs2-req to pass responses to the requests
-made by the kernel side.
-
-A buffer_list containing:
-  - a pointer to the prepared response to the request from the
-    kernel (struct pvfs2_downcall_t).
-  - and also, in the case of a readdir request, a pointer to a
-    buffer containing descriptors for the objects in the target
-    directory.
-... is sent to the function (PINT_dev_write_list) which performs
-the writev.
-
-PINT_dev_write_list has a local iovec array: struct iovec io_array[10];
-
-The first four elements of io_array are initialized like this for all
-responses:
-
-  io_array[0].iov_base = address of local variable "proto_ver" (int32_t)
-  io_array[0].iov_len = sizeof(int32_t)
-
-  io_array[1].iov_base = address of global variable "pdev_magic" (int32_t)
-  io_array[1].iov_len = sizeof(int32_t)
-
-  io_array[2].iov_base = address of parameter "tag" (PVFS_id_gen_t)
-  io_array[2].iov_len = sizeof(int64_t)
-
-  io_array[3].iov_base = address of out_downcall member (pvfs2_downcall_t)
-                         of global variable vfs_request (vfs_request_t)
-  io_array[3].iov_len = sizeof(pvfs2_downcall_t)
-
-Readdir responses initialize the fifth element io_array like this:
-
-  io_array[4].iov_base = contents of member trailer_buf (char *)
-                         from out_downcall member of global variable
-                         vfs_request
-  io_array[4].iov_len = contents of member trailer_size (PVFS_size)
-                        from out_downcall member of global variable
-                        vfs_request
-
-Orangefs exploits the dcache in order to avoid sending redundant
-requests to userspace. We keep object inode attributes up-to-date with
-orangefs_inode_getattr. Orangefs_inode_getattr uses two arguments to
-help it decide whether or not to update an inode: "new" and "bypass".
-Orangefs keeps private data in an object's inode that includes a short
-timeout value, getattr_time, which allows any iteration of
-orangefs_inode_getattr to know how long it has been since the inode was
-updated. When the object is not new (new == 0) and the bypass flag is not
-set (bypass == 0) orangefs_inode_getattr returns without updating the inode
-if getattr_time has not timed out. Getattr_time is updated each time the
-inode is updated.
-
-Creation of a new object (file, dir, sym-link) includes the evaluation of
-its pathname, resulting in a negative directory entry for the object.
-A new inode is allocated and associated with the dentry, turning it from
-a negative dentry into a "productive full member of society". Orangefs
-obtains the new inode from Linux with new_inode() and associates
-the inode with the dentry by sending the pair back to Linux with
-d_instantiate().
-
-The evaluation of a pathname for an object resolves to its corresponding
-dentry. If there is no corresponding dentry, one is created for it in
-the dcache. Whenever a dentry is modified or verified Orangefs stores a
-short timeout value in the dentry's d_time, and the dentry will be trusted
-for that amount of time. Orangefs is a network filesystem, and objects
-can potentially change out-of-band with any particular Orangefs kernel module
-instance, so trusting a dentry is risky. The alternative to trusting
-dentries is to always obtain the needed information from userspace - at
-least a trip to the client-core, maybe to the servers. Obtaining information
-from a dentry is cheap, obtaining it from userspace is relatively expensive,
-hence the motivation to use the dentry when possible.
-
-The timeout values d_time and getattr_time are jiffy based, and the
-code is designed to avoid the jiffy-wrap problem:
-
-"In general, if the clock may have wrapped around more than once, there
-is no way to tell how much time has elapsed. However, if the times t1
-and t2 are known to be fairly close, we can reliably compute the
-difference in a way that takes into account the possibility that the
-clock may have wrapped between times."
-
-                      from course notes by instructor Andy Wang
-
-- 
cgit 


From c33e97efa9d9de538e5f0afe6cb07f83afcd5b68 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:18 +0100
Subject: docs: filesystems: convert proc.txt to ReST

This document has a nice format! Unfortunately, not exactly
ReST. So, several adjustments were required:

- Add a SPDX header;
- Adjust document and section titles;
- Whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add table captions;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/1d113d860188de416ca3b0b97371dc2195433d5b.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |    1 +
 Documentation/filesystems/proc.rst  | 2169 +++++++++++++++++++++++++++++++++++
 Documentation/filesystems/proc.txt  | 2047 ---------------------------------
 3 files changed, 2170 insertions(+), 2047 deletions(-)
 create mode 100644 Documentation/filesystems/proc.rst
 delete mode 100644 Documentation/filesystems/proc.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index fed53f831192..671906e2fee6 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -81,5 +81,6 @@ Documentation for filesystem implementations.
    omfs
    orangefs
    overlayfs
+   proc
    virtiofs
    vfat
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
new file mode 100644
index 000000000000..38b606991065
--- /dev/null
+++ b/Documentation/filesystems/proc.rst
@@ -0,0 +1,2169 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+The /proc Filesystem
+====================
+
+=====================  =======================================  ================
+/proc/sys              Terrehon Bowden <terrehon@pacbell.net>,  October 7 1999
+                       Bodo Bauer <bb@ricochet.net>
+2.4.x update	       Jorge Nerin <comandante@zaralinux.com>   November 14 2000
+move /proc/sys	       Shen Feng <shen@cn.fujitsu.com>	        April 1 2009
+fixes/update part 1.1  Stefani Seibold <stefani@seibold.net>    June 9 2009
+=====================  =======================================  ================
+
+
+
+.. Table of Contents
+
+  0     Preface
+  0.1	Introduction/Credits
+  0.2	Legal Stuff
+
+  1	Collecting System Information
+  1.1	Process-Specific Subdirectories
+  1.2	Kernel data
+  1.3	IDE devices in /proc/ide
+  1.4	Networking info in /proc/net
+  1.5	SCSI info
+  1.6	Parallel port info in /proc/parport
+  1.7	TTY info in /proc/tty
+  1.8	Miscellaneous kernel statistics in /proc/stat
+  1.9	Ext4 file system parameters
+
+  2	Modifying System Parameters
+
+  3	Per-Process Parameters
+  3.1	/proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj - Adjust the oom-killer
+								score
+  3.2	/proc/<pid>/oom_score - Display current oom-killer score
+  3.3	/proc/<pid>/io - Display the IO accounting fields
+  3.4	/proc/<pid>/coredump_filter - Core dump filtering settings
+  3.5	/proc/<pid>/mountinfo - Information about mounts
+  3.6	/proc/<pid>/comm  & /proc/<pid>/task/<tid>/comm
+  3.7   /proc/<pid>/task/<tid>/children - Information about task children
+  3.8   /proc/<pid>/fdinfo/<fd> - Information about opened file
+  3.9   /proc/<pid>/map_files - Information about memory mapped files
+  3.10  /proc/<pid>/timerslack_ns - Task timerslack value
+  3.11	/proc/<pid>/patch_state - Livepatch patch operation state
+  3.12	/proc/<pid>/arch_status - Task architecture specific information
+
+  4	Configuring procfs
+  4.1	Mount options
+
+Preface
+=======
+
+0.1 Introduction/Credits
+------------------------
+
+This documentation is  part of a soon (or  so we hope) to be  released book on
+the SuSE  Linux distribution. As  there is  no complete documentation  for the
+/proc file system and we've used  many freely available sources to write these
+chapters, it  seems only fair  to give the work  back to the  Linux community.
+This work is  based on the 2.2.*  kernel version and the  upcoming 2.4.*. I'm
+afraid it's still far from complete, but we  hope it will be useful. As far as
+we know, it is the first 'all-in-one' document about the /proc file system. It
+is focused  on the Intel  x86 hardware,  so if you  are looking for  PPC, ARM,
+SPARC, AXP, etc., features, you probably  won't find what you are looking for.
+It also only covers IPv4 networking, not IPv6 nor other protocols - sorry. But
+additions and patches  are welcome and will  be added to this  document if you
+mail them to Bodo.
+
+We'd like  to  thank Alan Cox, Rik van Riel, and Alexey Kuznetsov and a lot of
+other people for help compiling this documentation. We'd also like to extend a
+special thank  you to Andi Kleen for documentation, which we relied on heavily
+to create  this  document,  as well as the additional information he provided.
+Thanks to  everybody  else  who contributed source or docs to the Linux kernel
+and helped create a great piece of software... :)
+
+If you  have  any comments, corrections or additions, please don't hesitate to
+contact Bodo  Bauer  at  bb@ricochet.net.  We'll  be happy to add them to this
+document.
+
+The   latest   version    of   this   document   is    available   online   at
+http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html
+
+If  the above  direction does  not works  for you,  you could  try the  kernel
+mailing  list  at  linux-kernel@vger.kernel.org  and/or try  to  reach  me  at
+comandante@zaralinux.com.
+
+0.2 Legal Stuff
+---------------
+
+We don't  guarantee  the  correctness  of this document, and if you come to us
+complaining about  how  you  screwed  up  your  system  because  of  incorrect
+documentation, we won't feel responsible...
+
+Chapter 1: Collecting System Information
+========================================
+
+In This Chapter
+---------------
+* Investigating  the  properties  of  the  pseudo  file  system  /proc and its
+  ability to provide information on the running Linux system
+* Examining /proc's structure
+* Uncovering  various  information  about the kernel and the processes running
+  on the system
+
+------------------------------------------------------------------------------
+
+The proc  file  system acts as an interface to internal data structures in the
+kernel. It  can  be  used to obtain information about the system and to change
+certain kernel parameters at runtime (sysctl).
+
+First, we'll  take  a  look  at the read-only parts of /proc. In Chapter 2, we
+show you how you can use /proc/sys to change settings.
+
+1.1 Process-Specific Subdirectories
+-----------------------------------
+
+The directory  /proc  contains  (among other things) one subdirectory for each
+process running on the system, which is named after the process ID (PID).
+
+The link  self  points  to  the  process reading the file system. Each process
+subdirectory has the entries listed in Table 1-1.
+
+Note that an open a file descriptor to /proc/<pid> or to any of its
+contained files or subdirectories does not prevent <pid> being reused
+for some other process in the event that <pid> exits. Operations on
+open /proc/<pid> file descriptors corresponding to dead processes
+never act on any new process that the kernel may, through chance, have
+also assigned the process ID <pid>. Instead, operations on these FDs
+usually fail with ESRCH.
+
+.. table:: Table 1-1: Process specific entries in /proc
+
+ =============  ===============================================================
+ File		Content
+ =============  ===============================================================
+ clear_refs	Clears page referenced bits shown in smaps output
+ cmdline	Command line arguments
+ cpu		Current and last cpu in which it was executed	(2.4)(smp)
+ cwd		Link to the current working directory
+ environ	Values of environment variables
+ exe		Link to the executable of this process
+ fd		Directory, which contains all file descriptors
+ maps		Memory maps to executables and library files	(2.4)
+ mem		Memory held by this process
+ root		Link to the root directory of this process
+ stat		Process status
+ statm		Process memory status information
+ status		Process status in human readable form
+ wchan		Present with CONFIG_KALLSYMS=y: it shows the kernel function
+		symbol the task is blocked in - or "0" if not blocked.
+ pagemap	Page table
+ stack		Report full stack trace, enable via CONFIG_STACKTRACE
+ smaps		An extension based on maps, showing the memory consumption of
+		each mapping and flags associated with it
+ smaps_rollup	Accumulated smaps stats for all mappings of the process.  This
+		can be derived from smaps, but is faster and more convenient
+ numa_maps	An extension based on maps, showing the memory locality and
+		binding policy as well as mem usage (in pages) of each mapping.
+ =============  ===============================================================
+
+For example, to get the status information of a process, all you have to do is
+read the file /proc/PID/status::
+
+  >cat /proc/self/status
+  Name:   cat
+  State:  R (running)
+  Tgid:   5452
+  Pid:    5452
+  PPid:   743
+  TracerPid:      0						(2.4)
+  Uid:    501     501     501     501
+  Gid:    100     100     100     100
+  FDSize: 256
+  Groups: 100 14 16
+  VmPeak:     5004 kB
+  VmSize:     5004 kB
+  VmLck:         0 kB
+  VmHWM:       476 kB
+  VmRSS:       476 kB
+  RssAnon:             352 kB
+  RssFile:             120 kB
+  RssShmem:              4 kB
+  VmData:      156 kB
+  VmStk:        88 kB
+  VmExe:        68 kB
+  VmLib:      1412 kB
+  VmPTE:        20 kb
+  VmSwap:        0 kB
+  HugetlbPages:          0 kB
+  CoreDumping:    0
+  THP_enabled:	  1
+  Threads:        1
+  SigQ:   0/28578
+  SigPnd: 0000000000000000
+  ShdPnd: 0000000000000000
+  SigBlk: 0000000000000000
+  SigIgn: 0000000000000000
+  SigCgt: 0000000000000000
+  CapInh: 00000000fffffeff
+  CapPrm: 0000000000000000
+  CapEff: 0000000000000000
+  CapBnd: ffffffffffffffff
+  CapAmb: 0000000000000000
+  NoNewPrivs:     0
+  Seccomp:        0
+  Speculation_Store_Bypass:       thread vulnerable
+  voluntary_ctxt_switches:        0
+  nonvoluntary_ctxt_switches:     1
+
+This shows you nearly the same information you would get if you viewed it with
+the ps  command.  In  fact,  ps  uses  the  proc  file  system  to  obtain its
+information.  But you get a more detailed  view of the  process by reading the
+file /proc/PID/status. It fields are described in table 1-2.
+
+The  statm  file  contains  more  detailed  information about the process
+memory usage. Its seven fields are explained in Table 1-3.  The stat file
+contains details information about the process itself.  Its fields are
+explained in Table 1-4.
+
+(for SMP CONFIG users)
+
+For making accounting scalable, RSS related information are handled in an
+asynchronous manner and the value may not be very precise. To see a precise
+snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
+It's slow but very precise.
+
+.. table:: Table 1-2: Contents of the status files (as of 4.19)
+
+ ==========================  ===================================================
+ Field                       Content
+ ==========================  ===================================================
+ Name                        filename of the executable
+ Umask                       file mode creation mask
+ State                       state (R is running, S is sleeping, D is sleeping
+                             in an uninterruptible wait, Z is zombie,
+			     T is traced or stopped)
+ Tgid                        thread group ID
+ Ngid                        NUMA group ID (0 if none)
+ Pid                         process id
+ PPid                        process id of the parent process
+ TracerPid                   PID of process tracing this process (0 if not)
+ Uid                         Real, effective, saved set, and  file system UIDs
+ Gid                         Real, effective, saved set, and  file system GIDs
+ FDSize                      number of file descriptor slots currently allocated
+ Groups                      supplementary group list
+ NStgid                      descendant namespace thread group ID hierarchy
+ NSpid                       descendant namespace process ID hierarchy
+ NSpgid                      descendant namespace process group ID hierarchy
+ NSsid                       descendant namespace session ID hierarchy
+ VmPeak                      peak virtual memory size
+ VmSize                      total program size
+ VmLck                       locked memory size
+ VmPin                       pinned memory size
+ VmHWM                       peak resident set size ("high water mark")
+ VmRSS                       size of memory portions. It contains the three
+                             following parts
+                             (VmRSS = RssAnon + RssFile + RssShmem)
+ RssAnon                     size of resident anonymous memory
+ RssFile                     size of resident file mappings
+ RssShmem                    size of resident shmem memory (includes SysV shm,
+                             mapping of tmpfs and shared anonymous mappings)
+ VmData                      size of private data segments
+ VmStk                       size of stack segments
+ VmExe                       size of text segment
+ VmLib                       size of shared library code
+ VmPTE                       size of page table entries
+ VmSwap                      amount of swap used by anonymous private data
+                             (shmem swap usage is not included)
+ HugetlbPages                size of hugetlb memory portions
+ CoreDumping                 process's memory is currently being dumped
+                             (killing the process may lead to a corrupted core)
+ THP_enabled		     process is allowed to use THP (returns 0 when
+			     PR_SET_THP_DISABLE is set on the process
+ Threads                     number of threads
+ SigQ                        number of signals queued/max. number for queue
+ SigPnd                      bitmap of pending signals for the thread
+ ShdPnd                      bitmap of shared pending signals for the process
+ SigBlk                      bitmap of blocked signals
+ SigIgn                      bitmap of ignored signals
+ SigCgt                      bitmap of caught signals
+ CapInh                      bitmap of inheritable capabilities
+ CapPrm                      bitmap of permitted capabilities
+ CapEff                      bitmap of effective capabilities
+ CapBnd                      bitmap of capabilities bounding set
+ CapAmb                      bitmap of ambient capabilities
+ NoNewPrivs                  no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...)
+ Seccomp                     seccomp mode, like prctl(PR_GET_SECCOMP, ...)
+ Speculation_Store_Bypass    speculative store bypass mitigation status
+ Cpus_allowed                mask of CPUs on which this process may run
+ Cpus_allowed_list           Same as previous, but in "list format"
+ Mems_allowed                mask of memory nodes allowed to this process
+ Mems_allowed_list           Same as previous, but in "list format"
+ voluntary_ctxt_switches     number of voluntary context switches
+ nonvoluntary_ctxt_switches  number of non voluntary context switches
+ ==========================  ===================================================
+
+
+.. table:: Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
+
+ ======== ===============================	==============================
+ Field    Content
+ ======== ===============================	==============================
+ size     total program size (pages)		(same as VmSize in status)
+ resident size of memory portions (pages)	(same as VmRSS in status)
+ shared   number of pages that are shared	(i.e. backed by a file, same
+						as RssFile+RssShmem in status)
+ trs      number of pages that are 'code'	(not including libs; broken,
+						includes data segment)
+ lrs      number of pages of library		(always 0 on 2.6)
+ drs      number of pages of data/stack		(including libs; broken,
+						includes library text)
+ dt       number of dirty pages			(always 0 on 2.6)
+ ======== ===============================	==============================
+
+
+.. table:: Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
+
+  ============= ===============================================================
+  Field         Content
+  ============= ===============================================================
+  pid           process id
+  tcomm         filename of the executable
+  state         state (R is running, S is sleeping, D is sleeping in an
+                uninterruptible wait, Z is zombie, T is traced or stopped)
+  ppid          process id of the parent process
+  pgrp          pgrp of the process
+  sid           session id
+  tty_nr        tty the process uses
+  tty_pgrp      pgrp of the tty
+  flags         task flags
+  min_flt       number of minor faults
+  cmin_flt      number of minor faults with child's
+  maj_flt       number of major faults
+  cmaj_flt      number of major faults with child's
+  utime         user mode jiffies
+  stime         kernel mode jiffies
+  cutime        user mode jiffies with child's
+  cstime        kernel mode jiffies with child's
+  priority      priority level
+  nice          nice level
+  num_threads   number of threads
+  it_real_value	(obsolete, always 0)
+  start_time    time the process started after system boot
+  vsize         virtual memory size
+  rss           resident set memory size
+  rsslim        current limit in bytes on the rss
+  start_code    address above which program text can run
+  end_code      address below which program text can run
+  start_stack   address of the start of the main process stack
+  esp           current value of ESP
+  eip           current value of EIP
+  pending       bitmap of pending signals
+  blocked       bitmap of blocked signals
+  sigign        bitmap of ignored signals
+  sigcatch      bitmap of caught signals
+  0		(place holder, used to be the wchan address,
+		use /proc/PID/wchan instead)
+  0             (place holder)
+  0             (place holder)
+  exit_signal   signal to send to parent thread on exit
+  task_cpu      which CPU the task is scheduled on
+  rt_priority   realtime priority
+  policy        scheduling policy (man sched_setscheduler)
+  blkio_ticks   time spent waiting for block IO
+  gtime         guest time of the task in jiffies
+  cgtime        guest time of the task children in jiffies
+  start_data    address above which program data+bss is placed
+  end_data      address below which program data+bss is placed
+  start_brk     address above which program heap can be expanded with brk()
+  arg_start     address above which program command line is placed
+  arg_end       address below which program command line is placed
+  env_start     address above which program environment is placed
+  env_end       address below which program environment is placed
+  exit_code     the thread's exit_code in the form reported by the waitpid
+		system call
+  ============= ===============================================================
+
+The /proc/PID/maps file contains the currently mapped memory regions and
+their access permissions.
+
+The format is::
+
+    address           perms offset  dev   inode      pathname
+
+    08048000-08049000 r-xp 00000000 03:00 8312       /opt/test
+    08049000-0804a000 rw-p 00001000 03:00 8312       /opt/test
+    0804a000-0806b000 rw-p 00000000 00:00 0          [heap]
+    a7cb1000-a7cb2000 ---p 00000000 00:00 0
+    a7cb2000-a7eb2000 rw-p 00000000 00:00 0
+    a7eb2000-a7eb3000 ---p 00000000 00:00 0
+    a7eb3000-a7ed5000 rw-p 00000000 00:00 0
+    a7ed5000-a8008000 r-xp 00000000 03:00 4222       /lib/libc.so.6
+    a8008000-a800a000 r--p 00133000 03:00 4222       /lib/libc.so.6
+    a800a000-a800b000 rw-p 00135000 03:00 4222       /lib/libc.so.6
+    a800b000-a800e000 rw-p 00000000 00:00 0
+    a800e000-a8022000 r-xp 00000000 03:00 14462      /lib/libpthread.so.0
+    a8022000-a8023000 r--p 00013000 03:00 14462      /lib/libpthread.so.0
+    a8023000-a8024000 rw-p 00014000 03:00 14462      /lib/libpthread.so.0
+    a8024000-a8027000 rw-p 00000000 00:00 0
+    a8027000-a8043000 r-xp 00000000 03:00 8317       /lib/ld-linux.so.2
+    a8043000-a8044000 r--p 0001b000 03:00 8317       /lib/ld-linux.so.2
+    a8044000-a8045000 rw-p 0001c000 03:00 8317       /lib/ld-linux.so.2
+    aff35000-aff4a000 rw-p 00000000 00:00 0          [stack]
+    ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso]
+
+where "address" is the address space in the process that it occupies, "perms"
+is a set of permissions::
+
+ r = read
+ w = write
+ x = execute
+ s = shared
+ p = private (copy on write)
+
+"offset" is the offset into the mapping, "dev" is the device (major:minor), and
+"inode" is the inode  on that device.  0 indicates that  no inode is associated
+with the memory region, as the case would be with BSS (uninitialized data).
+The "pathname" shows the name associated file for this mapping.  If the mapping
+is not associated with a file:
+
+ =======                    ====================================
+ [heap]                     the heap of the program
+ [stack]                    the stack of the main process
+ [vdso]                     the "virtual dynamic shared object",
+                            the kernel system call handler
+ =======                    ====================================
+
+ or if empty, the mapping is anonymous.
+
+The /proc/PID/smaps is an extension based on maps, showing the memory
+consumption for each of the process's mappings. For each mapping (aka Virtual
+Memory Area, or VMA) there is a series of lines such as the following::
+
+    08048000-080bc000 r-xp 00000000 03:02 13130      /bin/bash
+
+    Size:               1084 kB
+    KernelPageSize:        4 kB
+    MMUPageSize:           4 kB
+    Rss:                 892 kB
+    Pss:                 374 kB
+    Shared_Clean:        892 kB
+    Shared_Dirty:          0 kB
+    Private_Clean:         0 kB
+    Private_Dirty:         0 kB
+    Referenced:          892 kB
+    Anonymous:             0 kB
+    LazyFree:              0 kB
+    AnonHugePages:         0 kB
+    ShmemPmdMapped:        0 kB
+    Shared_Hugetlb:        0 kB
+    Private_Hugetlb:       0 kB
+    Swap:                  0 kB
+    SwapPss:               0 kB
+    KernelPageSize:        4 kB
+    MMUPageSize:           4 kB
+    Locked:                0 kB
+    THPeligible:           0
+    VmFlags: rd ex mr mw me dw
+
+The first of these lines shows the same information as is displayed for the
+mapping in /proc/PID/maps.  Following lines show the size of the mapping
+(size); the size of each page allocated when backing a VMA (KernelPageSize),
+which is usually the same as the size in the page table entries; the page size
+used by the MMU when backing a VMA (in most cases, the same as KernelPageSize);
+the amount of the mapping that is currently resident in RAM (RSS); the
+process' proportional share of this mapping (PSS); and the number of clean and
+dirty shared and private pages in the mapping.
+
+The "proportional set size" (PSS) of a process is the count of pages it has
+in memory, where each page is divided by the number of processes sharing it.
+So if a process has 1000 pages all to itself, and 1000 shared with one other
+process, its PSS will be 1500.
+
+Note that even a page which is part of a MAP_SHARED mapping, but has only
+a single pte mapped, i.e.  is currently used by only one process, is accounted
+as private and not as shared.
+
+"Referenced" indicates the amount of memory currently marked as referenced or
+accessed.
+
+"Anonymous" shows the amount of memory that does not belong to any file.  Even
+a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
+and a page is modified, the file page is replaced by a private anonymous copy.
+
+"LazyFree" shows the amount of memory which is marked by madvise(MADV_FREE).
+The memory isn't freed immediately with madvise(). It's freed in memory
+pressure if the memory is clean. Please note that the printed value might
+be lower than the real value due to optimizations used in the current
+implementation. If this is not desirable please file a bug report.
+
+"AnonHugePages" shows the ammount of memory backed by transparent hugepage.
+
+"ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by
+huge pages.
+
+"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
+hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
+reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field.
+
+"Swap" shows how much would-be-anonymous memory is also used, but out on swap.
+
+For shmem mappings, "Swap" includes also the size of the mapped (and not
+replaced by copy-on-write) part of the underlying shmem object out on swap.
+"SwapPss" shows proportional swap share of this mapping. Unlike "Swap", this
+does not take into account swapped out page of underlying shmem objects.
+"Locked" indicates whether the mapping is locked in memory or not.
+"THPeligible" indicates whether the mapping is eligible for allocating THP
+pages - 1 if true, 0 otherwise. It just shows the current status.
+
+"VmFlags" field deserves a separate description. This member represents the
+kernel flags associated with the particular virtual memory area in two letter
+encoded manner. The codes are the following:
+
+    ==    =======================================
+    rd    readable
+    wr    writeable
+    ex    executable
+    sh    shared
+    mr    may read
+    mw    may write
+    me    may execute
+    ms    may share
+    gd    stack segment growns down
+    pf    pure PFN range
+    dw    disabled write to the mapped file
+    lo    pages are locked in memory
+    io    memory mapped I/O area
+    sr    sequential read advise provided
+    rr    random read advise provided
+    dc    do not copy area on fork
+    de    do not expand area on remapping
+    ac    area is accountable
+    nr    swap space is not reserved for the area
+    ht    area uses huge tlb pages
+    ar    architecture specific flag
+    dd    do not include area into core dump
+    sd    soft dirty flag
+    mm    mixed map area
+    hg    huge page advise flag
+    nh    no huge page advise flag
+    mg    mergable advise flag
+    ==    =======================================
+
+Note that there is no guarantee that every flag and associated mnemonic will
+be present in all further kernel releases. Things get changed, the flags may
+be vanished or the reverse -- new added. Interpretation of their meaning
+might change in future as well. So each consumer of these flags has to
+follow each specific kernel version for the exact semantic.
+
+This file is only present if the CONFIG_MMU kernel configuration option is
+enabled.
+
+Note: reading /proc/PID/maps or /proc/PID/smaps is inherently racy (consistent
+output can be achieved only in the single read call).
+
+This typically manifests when doing partial reads of these files while the
+memory map is being modified.  Despite the races, we do provide the following
+guarantees:
+
+1) The mapped addresses never go backwards, which implies no two
+   regions will ever overlap.
+2) If there is something at a given vaddr during the entirety of the
+   life of the smaps/maps walk, there will be some output for it.
+
+The /proc/PID/smaps_rollup file includes the same fields as /proc/PID/smaps,
+but their values are the sums of the corresponding values for all mappings of
+the process.  Additionally, it contains these fields:
+
+- Pss_Anon
+- Pss_File
+- Pss_Shmem
+
+They represent the proportional shares of anonymous, file, and shmem pages, as
+described for smaps above.  These fields are omitted in smaps since each
+mapping identifies the type (anon, file, or shmem) of all pages it contains.
+Thus all information in smaps_rollup can be derived from smaps, but at a
+significantly higher cost.
+
+The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG
+bits on both physical and virtual pages associated with a process, and the
+soft-dirty bit on pte (see Documentation/admin-guide/mm/soft-dirty.rst
+for details).
+To clear the bits for all the pages associated with the process::
+
+    > echo 1 > /proc/PID/clear_refs
+
+To clear the bits for the anonymous pages associated with the process::
+
+    > echo 2 > /proc/PID/clear_refs
+
+To clear the bits for the file mapped pages associated with the process::
+
+    > echo 3 > /proc/PID/clear_refs
+
+To clear the soft-dirty bit::
+
+    > echo 4 > /proc/PID/clear_refs
+
+To reset the peak resident set size ("high water mark") to the process's
+current value::
+
+    > echo 5 > /proc/PID/clear_refs
+
+Any other value written to /proc/PID/clear_refs will have no effect.
+
+The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags
+using /proc/kpageflags and number of times a page is mapped using
+/proc/kpagecount. For detailed explanation, see
+Documentation/admin-guide/mm/pagemap.rst.
+
+The /proc/pid/numa_maps is an extension based on maps, showing the memory
+locality and binding policy, as well as the memory usage (in pages) of
+each mapping. The output follows a general format where mapping details get
+summarized separated by blank spaces, one mapping per each file line::
+
+    address   policy    mapping details
+
+    00400000 default file=/usr/local/bin/app mapped=1 active=0 N3=1 kernelpagesize_kB=4
+    00600000 default file=/usr/local/bin/app anon=1 dirty=1 N3=1 kernelpagesize_kB=4
+    3206000000 default file=/lib64/ld-2.12.so mapped=26 mapmax=6 N0=24 N3=2 kernelpagesize_kB=4
+    320621f000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4
+    3206220000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4
+    3206221000 default anon=1 dirty=1 N3=1 kernelpagesize_kB=4
+    3206800000 default file=/lib64/libc-2.12.so mapped=59 mapmax=21 active=55 N0=41 N3=18 kernelpagesize_kB=4
+    320698b000 default file=/lib64/libc-2.12.so
+    3206b8a000 default file=/lib64/libc-2.12.so anon=2 dirty=2 N3=2 kernelpagesize_kB=4
+    3206b8e000 default file=/lib64/libc-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4
+    3206b8f000 default anon=3 dirty=3 active=1 N3=3 kernelpagesize_kB=4
+    7f4dc10a2000 default anon=3 dirty=3 N3=3 kernelpagesize_kB=4
+    7f4dc10b4000 default anon=2 dirty=2 active=1 N3=2 kernelpagesize_kB=4
+    7f4dc1200000 default file=/anon_hugepage\040(deleted) huge anon=1 dirty=1 N3=1 kernelpagesize_kB=2048
+    7fff335f0000 default stack anon=3 dirty=3 N3=3 kernelpagesize_kB=4
+    7fff3369d000 default mapped=1 mapmax=35 active=0 N3=1 kernelpagesize_kB=4
+
+Where:
+
+"address" is the starting address for the mapping;
+
+"policy" reports the NUMA memory policy set for the mapping (see Documentation/admin-guide/mm/numa_memory_policy.rst);
+
+"mapping details" summarizes mapping data such as mapping type, page usage counters,
+node locality page counters (N0 == node0, N1 == node1, ...) and the kernel page
+size, in KB, that is backing the mapping up.
+
+1.2 Kernel data
+---------------
+
+Similar to  the  process entries, the kernel data files give information about
+the running kernel. The files used to obtain this information are contained in
+/proc and  are  listed  in Table 1-5. Not all of these will be present in your
+system. It  depends  on the kernel configuration and the loaded modules, which
+files are there, and which are missing.
+
+.. table:: Table 1-5: Kernel info in /proc
+
+ ============ ===============================================================
+ File         Content
+ ============ ===============================================================
+ apm          Advanced power management info
+ buddyinfo    Kernel memory allocator information (see text)	(2.5)
+ bus          Directory containing bus specific information
+ cmdline      Kernel command line
+ cpuinfo      Info about the CPU
+ devices      Available devices (block and character)
+ dma          Used DMS channels
+ filesystems  Supported filesystems
+ driver       Various drivers grouped here, currently rtc	(2.4)
+ execdomains  Execdomains, related to security			(2.4)
+ fb 	      Frame Buffer devices				(2.4)
+ fs 	      File system parameters, currently nfs/exports	(2.4)
+ ide          Directory containing info about the IDE subsystem
+ interrupts   Interrupt usage
+ iomem 	      Memory map					(2.4)
+ ioports      I/O port usage
+ irq 	      Masks for irq to cpu affinity			(2.4)(smp?)
+ isapnp       ISA PnP (Plug&Play) Info				(2.4)
+ kcore        Kernel core image (can be ELF or A.OUT(deprecated in 2.4))
+ kmsg         Kernel messages
+ ksyms        Kernel symbol table
+ loadavg      Load average of last 1, 5 & 15 minutes
+ locks        Kernel locks
+ meminfo      Memory info
+ misc         Miscellaneous
+ modules      List of loaded modules
+ mounts       Mounted filesystems
+ net          Networking info (see text)
+ pagetypeinfo Additional page allocator information (see text)  (2.5)
+ partitions   Table of partitions known to the system
+ pci 	      Deprecated info of PCI bus (new way -> /proc/bus/pci/,
+              decoupled by lspci				(2.4)
+ rtc          Real time clock
+ scsi         SCSI info (see text)
+ slabinfo     Slab pool info
+ softirqs     softirq usage
+ stat         Overall statistics
+ swaps        Swap space utilization
+ sys          See chapter 2
+ sysvipc      Info of SysVIPC Resources (msg, sem, shm)		(2.4)
+ tty 	      Info of tty drivers
+ uptime       Wall clock since boot, combined idle time of all cpus
+ version      Kernel version
+ video 	      bttv info of video resources			(2.4)
+ vmallocinfo  Show vmalloced areas
+ ============ ===============================================================
+
+You can,  for  example,  check  which interrupts are currently in use and what
+they are used for by looking in the file /proc/interrupts::
+
+  > cat /proc/interrupts
+             CPU0
+    0:    8728810          XT-PIC  timer
+    1:        895          XT-PIC  keyboard
+    2:          0          XT-PIC  cascade
+    3:     531695          XT-PIC  aha152x
+    4:    2014133          XT-PIC  serial
+    5:      44401          XT-PIC  pcnet_cs
+    8:          2          XT-PIC  rtc
+   11:          8          XT-PIC  i82365
+   12:     182918          XT-PIC  PS/2 Mouse
+   13:          1          XT-PIC  fpu
+   14:    1232265          XT-PIC  ide0
+   15:          7          XT-PIC  ide1
+  NMI:          0
+
+In 2.4.* a couple of lines where added to this file LOC & ERR (this time is the
+output of a SMP machine)::
+
+  > cat /proc/interrupts
+
+             CPU0       CPU1
+    0:    1243498    1214548    IO-APIC-edge  timer
+    1:       8949       8958    IO-APIC-edge  keyboard
+    2:          0          0          XT-PIC  cascade
+    5:      11286      10161    IO-APIC-edge  soundblaster
+    8:          1          0    IO-APIC-edge  rtc
+    9:      27422      27407    IO-APIC-edge  3c503
+   12:     113645     113873    IO-APIC-edge  PS/2 Mouse
+   13:          0          0          XT-PIC  fpu
+   14:      22491      24012    IO-APIC-edge  ide0
+   15:       2183       2415    IO-APIC-edge  ide1
+   17:      30564      30414   IO-APIC-level  eth0
+   18:        177        164   IO-APIC-level  bttv
+  NMI:    2457961    2457959
+  LOC:    2457882    2457881
+  ERR:       2155
+
+NMI is incremented in this case because every timer interrupt generates a NMI
+(Non Maskable Interrupt) which is used by the NMI Watchdog to detect lockups.
+
+LOC is the local interrupt counter of the internal APIC of every CPU.
+
+ERR is incremented in the case of errors in the IO-APIC bus (the bus that
+connects the CPUs in a SMP system. This means that an error has been detected,
+the IO-APIC automatically retry the transmission, so it should not be a big
+problem, but you should read the SMP-FAQ.
+
+In 2.6.2* /proc/interrupts was expanded again.  This time the goal was for
+/proc/interrupts to display every IRQ vector in use by the system, not
+just those considered 'most important'.  The new vectors are:
+
+THR
+  interrupt raised when a machine check threshold counter
+  (typically counting ECC corrected errors of memory or cache) exceeds
+  a configurable threshold.  Only available on some systems.
+
+TRM
+  a thermal event interrupt occurs when a temperature threshold
+  has been exceeded for the CPU.  This interrupt may also be generated
+  when the temperature drops back to normal.
+
+SPU
+  a spurious interrupt is some interrupt that was raised then lowered
+  by some IO device before it could be fully processed by the APIC.  Hence
+  the APIC sees the interrupt but does not know what device it came from.
+  For this case the APIC will generate the interrupt with a IRQ vector
+  of 0xff. This might also be generated by chipset bugs.
+
+RES, CAL, TLB]
+  rescheduling, call and TLB flush interrupts are
+  sent from one CPU to another per the needs of the OS.  Typically,
+  their statistics are used by kernel developers and interested users to
+  determine the occurrence of interrupts of the given type.
+
+The above IRQ vectors are displayed only when relevant.  For example,
+the threshold vector does not exist on x86_64 platforms.  Others are
+suppressed when the system is a uniprocessor.  As of this writing, only
+i386 and x86_64 platforms support the new IRQ vector displays.
+
+Of some interest is the introduction of the /proc/irq directory to 2.4.
+It could be used to set IRQ to CPU affinity, this means that you can "hook" an
+IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the
+irq subdir is one subdir for each IRQ, and two files; default_smp_affinity and
+prof_cpu_mask.
+
+For example::
+
+  > ls /proc/irq/
+  0  10  12  14  16  18  2  4  6  8  prof_cpu_mask
+  1  11  13  15  17  19  3  5  7  9  default_smp_affinity
+  > ls /proc/irq/0/
+  smp_affinity
+
+smp_affinity is a bitmask, in which you can specify which CPUs can handle the
+IRQ, you can set it by doing::
+
+  > echo 1 > /proc/irq/10/smp_affinity
+
+This means that only the first CPU will handle the IRQ, but you can also echo
+5 which means that only the first and third CPU can handle the IRQ.
+
+The contents of each smp_affinity file is the same by default::
+
+  > cat /proc/irq/0/smp_affinity
+  ffffffff
+
+There is an alternate interface, smp_affinity_list which allows specifying
+a cpu range instead of a bitmask::
+
+  > cat /proc/irq/0/smp_affinity_list
+  1024-1031
+
+The default_smp_affinity mask applies to all non-active IRQs, which are the
+IRQs which have not yet been allocated/activated, and hence which lack a
+/proc/irq/[0-9]* directory.
+
+The node file on an SMP system shows the node to which the device using the IRQ
+reports itself as being attached. This hardware locality information does not
+include information about any possible driver locality preference.
+
+prof_cpu_mask specifies which CPUs are to be profiled by the system wide
+profiler. Default value is ffffffff (all cpus if there are only 32 of them).
+
+The way IRQs are routed is handled by the IO-APIC, and it's Round Robin
+between all the CPUs which are allowed to handle it. As usual the kernel has
+more info than you and does a better job than you, so the defaults are the
+best choice for almost everyone.  [Note this applies only to those IO-APIC's
+that support "Round Robin" interrupt distribution.]
+
+There are  three  more  important subdirectories in /proc: net, scsi, and sys.
+The general  rule  is  that  the  contents,  or  even  the  existence of these
+directories, depend  on your kernel configuration. If SCSI is not enabled, the
+directory scsi  may  not  exist. The same is true with the net, which is there
+only when networking support is present in the running kernel.
+
+The slabinfo  file  gives  information  about  memory usage at the slab level.
+Linux uses  slab  pools for memory management above page level in version 2.2.
+Commonly used  objects  have  their  own  slab  pool (such as network buffers,
+directory cache, and so on).
+
+::
+
+    > cat /proc/buddyinfo
+
+    Node 0, zone      DMA      0      4      5      4      4      3 ...
+    Node 0, zone   Normal      1      0      0      1    101      8 ...
+    Node 0, zone  HighMem      2      0      0      1      1      0 ...
+
+External fragmentation is a problem under some workloads, and buddyinfo is a
+useful tool for helping diagnose these problems.  Buddyinfo will give you a
+clue as to how big an area you can safely allocate, or why a previous
+allocation failed.
+
+Each column represents the number of pages of a certain order which are
+available.  In this case, there are 0 chunks of 2^0*PAGE_SIZE available in
+ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE
+available in ZONE_NORMAL, etc...
+
+More information relevant to external fragmentation can be found in
+pagetypeinfo::
+
+    > cat /proc/pagetypeinfo
+    Page block order: 9
+    Pages per block:  512
+
+    Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
+    Node    0, zone      DMA, type    Unmovable      0      0      0      1      1      1      1      1      1      1      0
+    Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
+    Node    0, zone      DMA, type      Movable      1      1      2      1      2      1      1      0      1      0      2
+    Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      1      0
+    Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
+    Node    0, zone    DMA32, type    Unmovable    103     54     77      1      1      1     11      8      7      1      9
+    Node    0, zone    DMA32, type  Reclaimable      0      0      2      1      0      0      0      0      1      0      0
+    Node    0, zone    DMA32, type      Movable    169    152    113     91     77     54     39     13      6      1    452
+    Node    0, zone    DMA32, type      Reserve      1      2      2      2      2      0      1      1      1      1      0
+    Node    0, zone    DMA32, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
+
+    Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate
+    Node 0, zone      DMA            2            0            5            1            0
+    Node 0, zone    DMA32           41            6          967            2            0
+
+Fragmentation avoidance in the kernel works by grouping pages of different
+migrate types into the same contiguous regions of memory called page blocks.
+A page block is typically the size of the default hugepage size e.g. 2MB on
+X86-64. By keeping pages grouped based on their ability to move, the kernel
+can reclaim pages within a page block to satisfy a high-order allocation.
+
+The pagetypinfo begins with information on the size of a page block. It
+then gives the same type of information as buddyinfo except broken down
+by migrate-type and finishes with details on how many page blocks of each
+type exist.
+
+If min_free_kbytes has been tuned correctly (recommendations made by hugeadm
+from libhugetlbfs https://github.com/libhugetlbfs/libhugetlbfs/), one can
+make an estimate of the likely number of huge pages that can be allocated
+at a given point in time. All the "Movable" blocks should be allocatable
+unless memory has been mlock()'d. Some of the Reclaimable blocks should
+also be allocatable although a lot of filesystem metadata may have to be
+reclaimed to achieve this.
+
+
+meminfo
+~~~~~~~
+
+Provides information about distribution and utilization of memory.  This
+varies by architecture and compile options.  The following is from a
+16GB PIII, which has highmem enabled.  You may not have all of these fields.
+
+::
+
+    > cat /proc/meminfo
+
+    MemTotal:     16344972 kB
+    MemFree:      13634064 kB
+    MemAvailable: 14836172 kB
+    Buffers:          3656 kB
+    Cached:        1195708 kB
+    SwapCached:          0 kB
+    Active:         891636 kB
+    Inactive:      1077224 kB
+    HighTotal:    15597528 kB
+    HighFree:     13629632 kB
+    LowTotal:       747444 kB
+    LowFree:          4432 kB
+    SwapTotal:           0 kB
+    SwapFree:            0 kB
+    Dirty:             968 kB
+    Writeback:           0 kB
+    AnonPages:      861800 kB
+    Mapped:         280372 kB
+    Shmem:             644 kB
+    KReclaimable:   168048 kB
+    Slab:           284364 kB
+    SReclaimable:   159856 kB
+    SUnreclaim:     124508 kB
+    PageTables:      24448 kB
+    NFS_Unstable:        0 kB
+    Bounce:              0 kB
+    WritebackTmp:        0 kB
+    CommitLimit:   7669796 kB
+    Committed_AS:   100056 kB
+    VmallocTotal:   112216 kB
+    VmallocUsed:       428 kB
+    VmallocChunk:   111088 kB
+    Percpu:          62080 kB
+    HardwareCorrupted:   0 kB
+    AnonHugePages:   49152 kB
+    ShmemHugePages:      0 kB
+    ShmemPmdMapped:      0 kB
+
+MemTotal
+              Total usable ram (i.e. physical ram minus a few reserved
+              bits and the kernel binary code)
+MemFree
+              The sum of LowFree+HighFree
+MemAvailable
+              An estimate of how much memory is available for starting new
+              applications, without swapping. Calculated from MemFree,
+              SReclaimable, the size of the file LRU lists, and the low
+              watermarks in each zone.
+              The estimate takes into account that the system needs some
+              page cache to function well, and that not all reclaimable
+              slab will be reclaimable, due to items being in use. The
+              impact of those factors will vary from system to system.
+Buffers
+              Relatively temporary storage for raw disk blocks
+              shouldn't get tremendously large (20MB or so)
+Cached
+              in-memory cache for files read from the disk (the
+              pagecache).  Doesn't include SwapCached
+SwapCached
+              Memory that once was swapped out, is swapped back in but
+              still also is in the swapfile (if memory is needed it
+              doesn't need to be swapped out AGAIN because it is already
+              in the swapfile. This saves I/O)
+Active
+              Memory that has been used more recently and usually not
+              reclaimed unless absolutely necessary.
+Inactive
+              Memory which has been less recently used.  It is more
+              eligible to be reclaimed for other purposes
+HighTotal, HighFree
+              Highmem is all memory above ~860MB of physical memory
+              Highmem areas are for use by userspace programs, or
+              for the pagecache.  The kernel must use tricks to access
+              this memory, making it slower to access than lowmem.
+LowTotal, LowFree
+              Lowmem is memory which can be used for everything that
+              highmem can be used for, but it is also available for the
+              kernel's use for its own data structures.  Among many
+              other things, it is where everything from the Slab is
+              allocated.  Bad things happen when you're out of lowmem.
+SwapTotal
+              total amount of swap space available
+SwapFree
+              Memory which has been evicted from RAM, and is temporarily
+              on the disk
+Dirty
+              Memory which is waiting to get written back to the disk
+Writeback
+              Memory which is actively being written back to the disk
+AnonPages
+              Non-file backed pages mapped into userspace page tables
+HardwareCorrupted
+              The amount of RAM/memory in KB, the kernel identifies as
+	      corrupted.
+AnonHugePages
+              Non-file backed huge pages mapped into userspace page tables
+Mapped
+              files which have been mmaped, such as libraries
+Shmem
+              Total memory used by shared memory (shmem) and tmpfs
+ShmemHugePages
+              Memory used by shared memory (shmem) and tmpfs allocated
+              with huge pages
+ShmemPmdMapped
+              Shared memory mapped into userspace with huge pages
+KReclaimable
+              Kernel allocations that the kernel will attempt to reclaim
+              under memory pressure. Includes SReclaimable (below), and other
+              direct allocations with a shrinker.
+Slab
+              in-kernel data structures cache
+SReclaimable
+              Part of Slab, that might be reclaimed, such as caches
+SUnreclaim
+              Part of Slab, that cannot be reclaimed on memory pressure
+PageTables
+              amount of memory dedicated to the lowest level of page
+              tables.
+NFS_Unstable
+              NFS pages sent to the server, but not yet committed to stable
+	      storage
+Bounce
+              Memory used for block device "bounce buffers"
+WritebackTmp
+              Memory used by FUSE for temporary writeback buffers
+CommitLimit
+              Based on the overcommit ratio ('vm.overcommit_ratio'),
+              this is the total amount of  memory currently available to
+              be allocated on the system. This limit is only adhered to
+              if strict overcommit accounting is enabled (mode 2 in
+              'vm.overcommit_memory').
+
+              The CommitLimit is calculated with the following formula::
+
+                CommitLimit = ([total RAM pages] - [total huge TLB pages]) *
+                               overcommit_ratio / 100 + [total swap pages]
+
+              For example, on a system with 1G of physical RAM and 7G
+              of swap with a `vm.overcommit_ratio` of 30 it would
+              yield a CommitLimit of 7.3G.
+
+              For more details, see the memory overcommit documentation
+              in vm/overcommit-accounting.
+Committed_AS
+              The amount of memory presently allocated on the system.
+              The committed memory is a sum of all of the memory which
+              has been allocated by processes, even if it has not been
+              "used" by them as of yet. A process which malloc()'s 1G
+              of memory, but only touches 300M of it will show up as
+	      using 1G. This 1G is memory which has been "committed" to
+              by the VM and can be used at any time by the allocating
+              application. With strict overcommit enabled on the system
+              (mode 2 in 'vm.overcommit_memory'),allocations which would
+              exceed the CommitLimit (detailed above) will not be permitted.
+              This is useful if one needs to guarantee that processes will
+              not fail due to lack of memory once that memory has been
+              successfully allocated.
+VmallocTotal
+              total size of vmalloc memory area
+VmallocUsed
+              amount of vmalloc area which is used
+VmallocChunk
+              largest contiguous block of vmalloc area which is free
+Percpu
+              Memory allocated to the percpu allocator used to back percpu
+              allocations. This stat excludes the cost of metadata.
+
+vmallocinfo
+~~~~~~~~~~~
+
+Provides information about vmalloced/vmaped areas. One line per area,
+containing the virtual address range of the area, size in bytes,
+caller information of the creator, and optional information depending
+on the kind of area :
+
+ ==========  ===================================================
+ pages=nr    number of pages
+ phys=addr   if a physical address was specified
+ ioremap     I/O mapping (ioremap() and friends)
+ vmalloc     vmalloc() area
+ vmap        vmap()ed pages
+ user        VM_USERMAP area
+ vpages      buffer for pages pointers was vmalloced (huge area)
+ N<node>=nr  (Only on NUMA kernels)
+             Number of pages allocated on memory node <node>
+ ==========  ===================================================
+
+::
+
+    > cat /proc/vmallocinfo
+    0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204 ...
+    /0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128
+    0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204 ...
+    /0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
+    0xffffc20000302000-0xffffc20000304000    8192 acpi_tb_verify_table+0x21/0x4f...
+    phys=7fee8000 ioremap
+    0xffffc20000304000-0xffffc20000307000   12288 acpi_tb_verify_table+0x21/0x4f...
+    phys=7fee7000 ioremap
+    0xffffc2000031d000-0xffffc2000031f000    8192 init_vdso_vars+0x112/0x210
+    0xffffc2000031f000-0xffffc2000032b000   49152 cramfs_uncompress_init+0x2e ...
+    /0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3
+    0xffffc2000033a000-0xffffc2000033d000   12288 sys_swapon+0x640/0xac0      ...
+    pages=2 vmalloc N1=2
+    0xffffc20000347000-0xffffc2000034c000   20480 xt_alloc_table_info+0xfe ...
+    /0x130 [x_tables] pages=4 vmalloc N0=4
+    0xffffffffa0000000-0xffffffffa000f000   61440 sys_init_module+0xc27/0x1d00 ...
+    pages=14 vmalloc N2=14
+    0xffffffffa000f000-0xffffffffa0014000   20480 sys_init_module+0xc27/0x1d00 ...
+    pages=4 vmalloc N1=4
+    0xffffffffa0014000-0xffffffffa0017000   12288 sys_init_module+0xc27/0x1d00 ...
+    pages=2 vmalloc N1=2
+    0xffffffffa0017000-0xffffffffa0022000   45056 sys_init_module+0xc27/0x1d00 ...
+    pages=10 vmalloc N0=10
+
+
+softirqs
+~~~~~~~~
+
+Provides counts of softirq handlers serviced since boot time, for each cpu.
+
+::
+
+    > cat /proc/softirqs
+		    CPU0       CPU1       CPU2       CPU3
+	HI:          0          0          0          0
+    TIMER:      27166      27120      27097      27034
+    NET_TX:          0          0          0         17
+    NET_RX:         42          0          0         39
+    BLOCK:          0          0        107       1121
+    TASKLET:          0          0          0        290
+    SCHED:      27035      26983      26971      26746
+    HRTIMER:          0          0          0          0
+	RCU:       1678       1769       2178       2250
+
+
+1.3 IDE devices in /proc/ide
+----------------------------
+
+The subdirectory /proc/ide contains information about all IDE devices of which
+the kernel  is  aware.  There is one subdirectory for each IDE controller, the
+file drivers  and a link for each IDE device, pointing to the device directory
+in the controller specific subtree.
+
+The file  drivers  contains general information about the drivers used for the
+IDE devices::
+
+  > cat /proc/ide/drivers
+  ide-cdrom version 4.53
+  ide-disk version 1.08
+
+More detailed  information  can  be  found  in  the  controller  specific
+subdirectories. These  are  named  ide0,  ide1  and  so  on.  Each  of  these
+directories contains the files shown in table 1-6.
+
+
+.. table:: Table 1-6: IDE controller info in  /proc/ide/ide?
+
+ ======= =======================================
+ File    Content
+ ======= =======================================
+ channel IDE channel (0 or 1)
+ config  Configuration (only for PCI/IDE bridge)
+ mate    Mate name
+ model   Type/Chipset of IDE controller
+ ======= =======================================
+
+Each device  connected  to  a  controller  has  a separate subdirectory in the
+controllers directory.  The  files  listed in table 1-7 are contained in these
+directories.
+
+
+.. table:: Table 1-7: IDE device information
+
+ ================ ==========================================
+ File             Content
+ ================ ==========================================
+ cache            The cache
+ capacity         Capacity of the medium (in 512Byte blocks)
+ driver           driver and version
+ geometry         physical and logical geometry
+ identify         device identify block
+ media            media type
+ model            device identifier
+ settings         device setup
+ smart_thresholds IDE disk management thresholds
+ smart_values     IDE disk management values
+ ================ ==========================================
+
+The most  interesting  file is ``settings``. This file contains a nice
+overview of the drive parameters::
+
+  # cat /proc/ide/ide0/hda/settings
+  name                    value           min             max             mode
+  ----                    -----           ---             ---             ----
+  bios_cyl                526             0               65535           rw
+  bios_head               255             0               255             rw
+  bios_sect               63              0               63              rw
+  breada_readahead        4               0               127             rw
+  bswap                   0               0               1               r
+  file_readahead          72              0               2097151         rw
+  io_32bit                0               0               3               rw
+  keepsettings            0               0               1               rw
+  max_kb_per_request      122             1               127             rw
+  multcount               0               0               8               rw
+  nice1                   1               0               1               rw
+  nowerr                  0               0               1               rw
+  pio_mode                write-only      0               255             w
+  slow                    0               0               1               rw
+  unmaskirq               0               0               1               rw
+  using_dma               0               0               1               rw
+
+
+1.4 Networking info in /proc/net
+--------------------------------
+
+The subdirectory  /proc/net  follows  the  usual  pattern. Table 1-8 shows the
+additional values  you  get  for  IP  version 6 if you configure the kernel to
+support this. Table 1-9 lists the files and their meaning.
+
+
+.. table:: Table 1-8: IPv6 info in /proc/net
+
+ ========== =====================================================
+ File       Content
+ ========== =====================================================
+ udp6       UDP sockets (IPv6)
+ tcp6       TCP sockets (IPv6)
+ raw6       Raw device statistics (IPv6)
+ igmp6      IP multicast addresses, which this host joined (IPv6)
+ if_inet6   List of IPv6 interface addresses
+ ipv6_route Kernel routing table for IPv6
+ rt6_stats  Global IPv6 routing tables statistics
+ sockstat6  Socket statistics (IPv6)
+ snmp6      Snmp data (IPv6)
+ ========== =====================================================
+
+.. table:: Table 1-9: Network info in /proc/net
+
+ ============= ================================================================
+ File          Content
+ ============= ================================================================
+ arp           Kernel  ARP table
+ dev           network devices with statistics
+ dev_mcast     the Layer2 multicast groups a device is listening too
+               (interface index, label, number of references, number of bound
+               addresses).
+ dev_stat      network device status
+ ip_fwchains   Firewall chain linkage
+ ip_fwnames    Firewall chain names
+ ip_masq       Directory containing the masquerading tables
+ ip_masquerade Major masquerading table
+ netstat       Network statistics
+ raw           raw device statistics
+ route         Kernel routing table
+ rpc           Directory containing rpc info
+ rt_cache      Routing cache
+ snmp          SNMP data
+ sockstat      Socket statistics
+ tcp           TCP  sockets
+ udp           UDP sockets
+ unix          UNIX domain sockets
+ wireless      Wireless interface data (Wavelan etc)
+ igmp          IP multicast addresses, which this host joined
+ psched        Global packet scheduler parameters.
+ netlink       List of PF_NETLINK sockets
+ ip_mr_vifs    List of multicast virtual interfaces
+ ip_mr_cache   List of multicast routing cache
+ ============= ================================================================
+
+You can  use  this  information  to see which network devices are available in
+your system and how much traffic was routed over those devices::
+
+  > cat /proc/net/dev
+  Inter-|Receive                                                   |[...
+   face |bytes    packets errs drop fifo frame compressed multicast|[...
+      lo:  908188   5596     0    0    0     0          0         0 [...
+    ppp0:15475140  20721   410    0    0   410          0         0 [...
+    eth0:  614530   7085     0    0    0     0          0         1 [...
+
+  ...] Transmit
+  ...] bytes    packets errs drop fifo colls carrier compressed
+  ...]  908188     5596    0    0    0     0       0          0
+  ...] 1375103    17405    0    0    0     0       0          0
+  ...] 1703981     5535    0    0    0     3       0          0
+
+In addition, each Channel Bond interface has its own directory.  For
+example, the bond0 device will have a directory called /proc/net/bond0/.
+It will contain information that is specific to that bond, such as the
+current slaves of the bond, the link status of the slaves, and how
+many times the slaves link has failed.
+
+1.5 SCSI info
+-------------
+
+If you  have  a  SCSI  host adapter in your system, you'll find a subdirectory
+named after  the driver for this adapter in /proc/scsi. You'll also see a list
+of all recognized SCSI devices in /proc/scsi::
+
+  >cat /proc/scsi/scsi
+  Attached devices:
+  Host: scsi0 Channel: 00 Id: 00 Lun: 00
+    Vendor: IBM      Model: DGHS09U          Rev: 03E0
+    Type:   Direct-Access                    ANSI SCSI revision: 03
+  Host: scsi0 Channel: 00 Id: 06 Lun: 00
+    Vendor: PIONEER  Model: CD-ROM DR-U06S   Rev: 1.04
+    Type:   CD-ROM                           ANSI SCSI revision: 02
+
+
+The directory  named  after  the driver has one file for each adapter found in
+the system.  These  files  contain information about the controller, including
+the used  IRQ  and  the  IO  address range. The amount of information shown is
+dependent on  the adapter you use. The example shows the output for an Adaptec
+AHA-2940 SCSI adapter::
+
+  > cat /proc/scsi/aic7xxx/0
+
+  Adaptec AIC7xxx driver version: 5.1.19/3.2.4
+  Compile Options:
+    TCQ Enabled By Default : Disabled
+    AIC7XXX_PROC_STATS     : Disabled
+    AIC7XXX_RESET_DELAY    : 5
+  Adapter Configuration:
+             SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter
+                             Ultra Wide Controller
+      PCI MMAPed I/O Base: 0xeb001000
+   Adapter SEEPROM Config: SEEPROM found and used.
+        Adaptec SCSI BIOS: Enabled
+                      IRQ: 10
+                     SCBs: Active 0, Max Active 2,
+                           Allocated 15, HW 16, Page 255
+               Interrupts: 160328
+        BIOS Control Word: 0x18b6
+     Adapter Control Word: 0x005b
+     Extended Translation: Enabled
+  Disconnect Enable Flags: 0xffff
+       Ultra Enable Flags: 0x0001
+   Tag Queue Enable Flags: 0x0000
+  Ordered Queue Tag Flags: 0x0000
+  Default Tag Queue Depth: 8
+      Tagged Queue By Device array for aic7xxx host instance 0:
+        {255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255}
+      Actual queue depth per device for aic7xxx host instance 0:
+        {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}
+  Statistics:
+  (scsi0:0:0:0)
+    Device using Wide/Sync transfers at 40.0 MByte/sec, offset 8
+    Transinfo settings: current(12/8/1/0), goal(12/8/1/0), user(12/15/1/0)
+    Total transfers 160151 (74577 reads and 85574 writes)
+  (scsi0:0:6:0)
+    Device using Narrow/Sync transfers at 5.0 MByte/sec, offset 15
+    Transinfo settings: current(50/15/0/0), goal(50/15/0/0), user(50/15/0/0)
+    Total transfers 0 (0 reads and 0 writes)
+
+
+1.6 Parallel port info in /proc/parport
+---------------------------------------
+
+The directory  /proc/parport  contains information about the parallel ports of
+your system.  It  has  one  subdirectory  for  each port, named after the port
+number (0,1,2,...).
+
+These directories contain the four files shown in Table 1-10.
+
+
+.. table:: Table 1-10: Files in /proc/parport
+
+ ========= ====================================================================
+ File      Content
+ ========= ====================================================================
+ autoprobe Any IEEE-1284 device ID information that has been acquired.
+ devices   list of the device drivers using that port. A + will appear by the
+           name of the device currently using the port (it might not appear
+           against any).
+ hardware  Parallel port's base address, IRQ line and DMA channel.
+ irq       IRQ that parport is using for that port. This is in a separate
+           file to allow you to alter it by writing a new value in (IRQ
+           number or none).
+ ========= ====================================================================
+
+1.7 TTY info in /proc/tty
+-------------------------
+
+Information about  the  available  and actually used tty's can be found in the
+directory /proc/tty.You'll  find  entries  for drivers and line disciplines in
+this directory, as shown in Table 1-11.
+
+
+.. table:: Table 1-11: Files in /proc/tty
+
+ ============= ==============================================
+ File          Content
+ ============= ==============================================
+ drivers       list of drivers and their usage
+ ldiscs        registered line disciplines
+ driver/serial usage statistic and status of single tty lines
+ ============= ==============================================
+
+To see  which  tty's  are  currently in use, you can simply look into the file
+/proc/tty/drivers::
+
+  > cat /proc/tty/drivers
+  pty_slave            /dev/pts      136   0-255 pty:slave
+  pty_master           /dev/ptm      128   0-255 pty:master
+  pty_slave            /dev/ttyp       3   0-255 pty:slave
+  pty_master           /dev/pty        2   0-255 pty:master
+  serial               /dev/cua        5   64-67 serial:callout
+  serial               /dev/ttyS       4   64-67 serial
+  /dev/tty0            /dev/tty0       4       0 system:vtmaster
+  /dev/ptmx            /dev/ptmx       5       2 system
+  /dev/console         /dev/console    5       1 system:console
+  /dev/tty             /dev/tty        5       0 system:/dev/tty
+  unknown              /dev/tty        4    1-63 console
+
+
+1.8 Miscellaneous kernel statistics in /proc/stat
+-------------------------------------------------
+
+Various pieces   of  information about  kernel activity  are  available in the
+/proc/stat file.  All  of  the numbers reported  in  this file are  aggregates
+since the system first booted.  For a quick look, simply cat the file::
+
+  > cat /proc/stat
+  cpu  2255 34 2290 22625563 6290 127 456 0 0 0
+  cpu0 1132 34 1441 11311718 3675 127 438 0 0 0
+  cpu1 1123 0 849 11313845 2614 0 18 0 0 0
+  intr 114930548 113199788 3 0 5 263 0 4 [... lots more numbers ...]
+  ctxt 1990473
+  btime 1062191376
+  processes 2915
+  procs_running 1
+  procs_blocked 0
+  softirq 183433 0 21755 12 39 1137 231 21459 2263
+
+The very first  "cpu" line aggregates the  numbers in all  of the other "cpuN"
+lines.  These numbers identify the amount of time the CPU has spent performing
+different kinds of work.  Time units are in USER_HZ (typically hundredths of a
+second).  The meanings of the columns are as follows, from left to right:
+
+- user: normal processes executing in user mode
+- nice: niced processes executing in user mode
+- system: processes executing in kernel mode
+- idle: twiddling thumbs
+- iowait: In a word, iowait stands for waiting for I/O to complete. But there
+  are several problems:
+
+  1. Cpu will not wait for I/O to complete, iowait is the time that a task is
+     waiting for I/O to complete. When cpu goes into idle state for
+     outstanding task io, another task will be scheduled on this CPU.
+  2. In a multi-core CPU, the task waiting for I/O to complete is not running
+     on any CPU, so the iowait of each CPU is difficult to calculate.
+  3. The value of iowait field in /proc/stat will decrease in certain
+     conditions.
+
+  So, the iowait is not reliable by reading from /proc/stat.
+- irq: servicing interrupts
+- softirq: servicing softirqs
+- steal: involuntary wait
+- guest: running a normal guest
+- guest_nice: running a niced guest
+
+The "intr" line gives counts of interrupts  serviced since boot time, for each
+of the  possible system interrupts.   The first  column  is the  total of  all
+interrupts serviced  including  unnumbered  architecture specific  interrupts;
+each  subsequent column is the  total for that particular numbered interrupt.
+Unnumbered interrupts are not shown, only summed into the total.
+
+The "ctxt" line gives the total number of context switches across all CPUs.
+
+The "btime" line gives  the time at which the  system booted, in seconds since
+the Unix epoch.
+
+The "processes" line gives the number  of processes and threads created, which
+includes (but  is not limited  to) those  created by  calls to the  fork() and
+clone() system calls.
+
+The "procs_running" line gives the total number of threads that are
+running or ready to run (i.e., the total number of runnable threads).
+
+The   "procs_blocked" line gives  the  number of  processes currently blocked,
+waiting for I/O to complete.
+
+The "softirq" line gives counts of softirqs serviced since boot time, for each
+of the possible system softirqs. The first column is the total of all
+softirqs serviced; each subsequent column is the total for that particular
+softirq.
+
+
+1.9 Ext4 file system parameters
+-------------------------------
+
+Information about mounted ext4 file systems can be found in
+/proc/fs/ext4.  Each mounted filesystem will have a directory in
+/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
+/proc/fs/ext4/dm-0).   The files in each per-device directory are shown
+in Table 1-12, below.
+
+.. table:: Table 1-12: Files in /proc/fs/ext4/<devname>
+
+ ==============  ==========================================================
+ File            Content
+ mb_groups       details of multiblock allocator buddy cache of free blocks
+ ==============  ==========================================================
+
+2.0 /proc/consoles
+------------------
+Shows registered system console lines.
+
+To see which character device lines are currently used for the system console
+/dev/console, you may simply look into the file /proc/consoles::
+
+  > cat /proc/consoles
+  tty0                 -WU (ECp)       4:7
+  ttyS0                -W- (Ep)        4:64
+
+The columns are:
+
++--------------------+-------------------------------------------------------+
+| device             | name of the device                                    |
++====================+=======================================================+
+| operations         | * R = can do read operations                          |
+|                    | * W = can do write operations                         |
+|                    | * U = can do unblank                                  |
++--------------------+-------------------------------------------------------+
+| flags              | * E = it is enabled                                   |
+|                    | * C = it is preferred console                         |
+|                    | * B = it is primary boot console                      |
+|                    | * p = it is used for printk buffer                    |
+|                    | * b = it is not a TTY but a Braille device            |
+|                    | * a = it is safe to use when cpu is offline           |
++--------------------+-------------------------------------------------------+
+| major:minor        | major and minor number of the device separated by a   |
+|                    | colon                                                 |
++--------------------+-------------------------------------------------------+
+
+Summary
+-------
+
+The /proc file system serves information about the running system. It not only
+allows access to process data but also allows you to request the kernel status
+by reading files in the hierarchy.
+
+The directory  structure  of /proc reflects the types of information and makes
+it easy, if not obvious, where to look for specific data.
+
+Chapter 2: Modifying System Parameters
+======================================
+
+In This Chapter
+---------------
+
+* Modifying kernel parameters by writing into files found in /proc/sys
+* Exploring the files which modify certain parameters
+* Review of the /proc/sys file tree
+
+------------------------------------------------------------------------------
+
+A very  interesting part of /proc is the directory /proc/sys. This is not only
+a source  of  information,  it also allows you to change parameters within the
+kernel. Be  very  careful  when attempting this. You can optimize your system,
+but you  can  also  cause  it  to  crash.  Never  alter kernel parameters on a
+production system.  Set  up  a  development machine and test to make sure that
+everything works  the  way  you want it to. You may have no alternative but to
+reboot the machine once an error has been made.
+
+To change  a  value,  simply  echo  the new value into the file. An example is
+given below  in the section on the file system data. You need to be root to do
+this. You  can  create  your  own  boot script to perform this every time your
+system boots.
+
+The files  in /proc/sys can be used to fine tune and monitor miscellaneous and
+general things  in  the operation of the Linux kernel. Since some of the files
+can inadvertently  disrupt  your  system,  it  is  advisable  to  read  both
+documentation and  source  before actually making adjustments. In any case, be
+very careful  when  writing  to  any  of these files. The entries in /proc may
+change slightly between the 2.1.* and the 2.2 kernel, so if there is any doubt
+review the kernel documentation in the directory /usr/src/linux/Documentation.
+This chapter  is  heavily  based  on the documentation included in the pre 2.2
+kernels, and became part of it in version 2.2.1 of the Linux kernel.
+
+Please see: Documentation/admin-guide/sysctl/ directory for descriptions of these
+entries.
+
+Summary
+-------
+
+Certain aspects  of  kernel  behavior  can be modified at runtime, without the
+need to  recompile  the kernel, or even to reboot the system. The files in the
+/proc/sys tree  can  not only be read, but also modified. You can use the echo
+command to write value into these files, thereby changing the default settings
+of the kernel.
+
+
+Chapter 3: Per-process Parameters
+=================================
+
+3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj- Adjust the oom-killer score
+--------------------------------------------------------------------------------
+
+These file can be used to adjust the badness heuristic used to select which
+process gets killed in out of memory conditions.
+
+The badness heuristic assigns a value to each candidate task ranging from 0
+(never kill) to 1000 (always kill) to determine which process is targeted.  The
+units are roughly a proportion along that range of allowed memory the process
+may allocate from based on an estimation of its current memory and swap use.
+For example, if a task is using all allowed memory, its badness score will be
+1000.  If it is using half of its allowed memory, its score will be 500.
+
+There is an additional factor included in the badness score: the current memory
+and swap usage is discounted by 3% for root processes.
+
+The amount of "allowed" memory depends on the context in which the oom killer
+was called.  If it is due to the memory assigned to the allocating task's cpuset
+being exhausted, the allowed memory represents the set of mems assigned to that
+cpuset.  If it is due to a mempolicy's node(s) being exhausted, the allowed
+memory represents the set of mempolicy nodes.  If it is due to a memory
+limit (or swap limit) being reached, the allowed memory is that configured
+limit.  Finally, if it is due to the entire system being out of memory, the
+allowed memory represents all allocatable resources.
+
+The value of /proc/<pid>/oom_score_adj is added to the badness score before it
+is used to determine which task to kill.  Acceptable values range from -1000
+(OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX).  This allows userspace to
+polarize the preference for oom killing either by always preferring a certain
+task or completely disabling it.  The lowest possible value, -1000, is
+equivalent to disabling oom killing entirely for that task since it will always
+report a badness score of 0.
+
+Consequently, it is very simple for userspace to define the amount of memory to
+consider for each task.  Setting a /proc/<pid>/oom_score_adj value of +500, for
+example, is roughly equivalent to allowing the remainder of tasks sharing the
+same system, cpuset, mempolicy, or memory controller resources to use at least
+50% more memory.  A value of -500, on the other hand, would be roughly
+equivalent to discounting 50% of the task's allowed memory from being considered
+as scoring against the task.
+
+For backwards compatibility with previous kernels, /proc/<pid>/oom_adj may also
+be used to tune the badness score.  Its acceptable values range from -16
+(OOM_ADJUST_MIN) to +15 (OOM_ADJUST_MAX) and a special value of -17
+(OOM_DISABLE) to disable oom killing entirely for that task.  Its value is
+scaled linearly with /proc/<pid>/oom_score_adj.
+
+The value of /proc/<pid>/oom_score_adj may be reduced no lower than the last
+value set by a CAP_SYS_RESOURCE process. To reduce the value any lower
+requires CAP_SYS_RESOURCE.
+
+Caveat: when a parent task is selected, the oom killer will sacrifice any first
+generation children with separate address spaces instead, if possible.  This
+avoids servers and important system daemons from being killed and loses the
+minimal amount of work.
+
+
+3.2 /proc/<pid>/oom_score - Display current oom-killer score
+-------------------------------------------------------------
+
+This file can be used to check the current score used by the oom-killer is for
+any given <pid>. Use it together with /proc/<pid>/oom_score_adj to tune which
+process should be killed in an out-of-memory situation.
+
+
+3.3  /proc/<pid>/io - Display the IO accounting fields
+-------------------------------------------------------
+
+This file contains IO statistics for each running process
+
+Example
+~~~~~~~
+
+::
+
+    test:/tmp # dd if=/dev/zero of=/tmp/test.dat &
+    [1] 3828
+
+    test:/tmp # cat /proc/3828/io
+    rchar: 323934931
+    wchar: 323929600
+    syscr: 632687
+    syscw: 632675
+    read_bytes: 0
+    write_bytes: 323932160
+    cancelled_write_bytes: 0
+
+
+Description
+~~~~~~~~~~~
+
+rchar
+^^^^^
+
+I/O counter: chars read
+The number of bytes which this task has caused to be read from storage. This
+is simply the sum of bytes which this process passed to read() and pread().
+It includes things like tty IO and it is unaffected by whether or not actual
+physical disk IO was required (the read might have been satisfied from
+pagecache)
+
+
+wchar
+^^^^^
+
+I/O counter: chars written
+The number of bytes which this task has caused, or shall cause to be written
+to disk. Similar caveats apply here as with rchar.
+
+
+syscr
+^^^^^
+
+I/O counter: read syscalls
+Attempt to count the number of read I/O operations, i.e. syscalls like read()
+and pread().
+
+
+syscw
+^^^^^
+
+I/O counter: write syscalls
+Attempt to count the number of write I/O operations, i.e. syscalls like
+write() and pwrite().
+
+
+read_bytes
+^^^^^^^^^^
+
+I/O counter: bytes read
+Attempt to count the number of bytes which this process really did cause to
+be fetched from the storage layer. Done at the submit_bio() level, so it is
+accurate for block-backed filesystems. <please add status regarding NFS and
+CIFS at a later time>
+
+
+write_bytes
+^^^^^^^^^^^
+
+I/O counter: bytes written
+Attempt to count the number of bytes which this process caused to be sent to
+the storage layer. This is done at page-dirtying time.
+
+
+cancelled_write_bytes
+^^^^^^^^^^^^^^^^^^^^^
+
+The big inaccuracy here is truncate. If a process writes 1MB to a file and
+then deletes the file, it will in fact perform no writeout. But it will have
+been accounted as having caused 1MB of write.
+In other words: The number of bytes which this process caused to not happen,
+by truncating pagecache. A task can cause "negative" IO too. If this task
+truncates some dirty pagecache, some IO which another task has been accounted
+for (in its write_bytes) will not be happening. We _could_ just subtract that
+from the truncating task's write_bytes, but there is information loss in doing
+that.
+
+
+.. Note::
+
+   At its current implementation state, this is a bit racy on 32-bit machines:
+   if process A reads process B's /proc/pid/io while process B is updating one
+   of those 64-bit counters, process A could see an intermediate result.
+
+
+More information about this can be found within the taskstats documentation in
+Documentation/accounting.
+
+3.4 /proc/<pid>/coredump_filter - Core dump filtering settings
+---------------------------------------------------------------
+When a process is dumped, all anonymous memory is written to a core file as
+long as the size of the core file isn't limited. But sometimes we don't want
+to dump some memory segments, for example, huge shared memory or DAX.
+Conversely, sometimes we want to save file-backed memory segments into a core
+file, not only the individual files.
+
+/proc/<pid>/coredump_filter allows you to customize which memory segments
+will be dumped when the <pid> process is dumped. coredump_filter is a bitmask
+of memory types. If a bit of the bitmask is set, memory segments of the
+corresponding memory type are dumped, otherwise they are not dumped.
+
+The following 9 memory types are supported:
+
+  - (bit 0) anonymous private memory
+  - (bit 1) anonymous shared memory
+  - (bit 2) file-backed private memory
+  - (bit 3) file-backed shared memory
+  - (bit 4) ELF header pages in file-backed private memory areas (it is
+    effective only if the bit 2 is cleared)
+  - (bit 5) hugetlb private memory
+  - (bit 6) hugetlb shared memory
+  - (bit 7) DAX private memory
+  - (bit 8) DAX shared memory
+
+  Note that MMIO pages such as frame buffer are never dumped and vDSO pages
+  are always dumped regardless of the bitmask status.
+
+  Note that bits 0-4 don't affect hugetlb or DAX memory. hugetlb memory is
+  only affected by bit 5-6, and DAX is only affected by bits 7-8.
+
+The default value of coredump_filter is 0x33; this means all anonymous memory
+segments, ELF header pages and hugetlb private memory are dumped.
+
+If you don't want to dump all shared memory segments attached to pid 1234,
+write 0x31 to the process's proc file::
+
+  $ echo 0x31 > /proc/1234/coredump_filter
+
+When a new process is created, the process inherits the bitmask status from its
+parent. It is useful to set up coredump_filter before the program runs.
+For example::
+
+  $ echo 0x7 > /proc/self/coredump_filter
+  $ ./some_program
+
+3.5	/proc/<pid>/mountinfo - Information about mounts
+--------------------------------------------------------
+
+This file contains lines of the form::
+
+    36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
+    (1)(2)(3)   (4)   (5)      (6)      (7)   (8) (9)   (10)         (11)
+
+    (1) mount ID:  unique identifier of the mount (may be reused after umount)
+    (2) parent ID:  ID of parent (or of self for the top of the mount tree)
+    (3) major:minor:  value of st_dev for files on filesystem
+    (4) root:  root of the mount within the filesystem
+    (5) mount point:  mount point relative to the process's root
+    (6) mount options:  per mount options
+    (7) optional fields:  zero or more fields of the form "tag[:value]"
+    (8) separator:  marks the end of the optional fields
+    (9) filesystem type:  name of filesystem of the form "type[.subtype]"
+    (10) mount source:  filesystem specific information or "none"
+    (11) super options:  per super block options
+
+Parsers should ignore all unrecognised optional fields.  Currently the
+possible optional fields are:
+
+================  ==============================================================
+shared:X          mount is shared in peer group X
+master:X          mount is slave to peer group X
+propagate_from:X  mount is slave and receives propagation from peer group X [#]_
+unbindable        mount is unbindable
+================  ==============================================================
+
+.. [#] X is the closest dominant peer group under the process's root.  If
+       X is the immediate master of the mount, or if there's no dominant peer
+       group under the same root, then only the "master:X" field is present
+       and not the "propagate_from:X" field.
+
+For more information on mount propagation see:
+
+  Documentation/filesystems/sharedsubtree.txt
+
+
+3.6	/proc/<pid>/comm  & /proc/<pid>/task/<tid>/comm
+--------------------------------------------------------
+These files provide a method to access a tasks comm value. It also allows for
+a task to set its own or one of its thread siblings comm value. The comm value
+is limited in size compared to the cmdline value, so writing anything longer
+then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated
+comm value.
+
+
+3.7	/proc/<pid>/task/<tid>/children - Information about task children
+-------------------------------------------------------------------------
+This file provides a fast way to retrieve first level children pids
+of a task pointed by <pid>/<tid> pair. The format is a space separated
+stream of pids.
+
+Note the "first level" here -- if a child has own children they will
+not be listed here, one needs to read /proc/<children-pid>/task/<tid>/children
+to obtain the descendants.
+
+Since this interface is intended to be fast and cheap it doesn't
+guarantee to provide precise results and some children might be
+skipped, especially if they've exited right after we printed their
+pids, so one need to either stop or freeze processes being inspected
+if precise results are needed.
+
+
+3.8	/proc/<pid>/fdinfo/<fd> - Information about opened file
+---------------------------------------------------------------
+This file provides information associated with an opened file. The regular
+files have at least three fields -- 'pos', 'flags' and mnt_id. The 'pos'
+represents the current offset of the opened file in decimal form [see lseek(2)
+for details], 'flags' denotes the octal O_xxx mask the file has been
+created with [see open(2) for details] and 'mnt_id' represents mount ID of
+the file system containing the opened file [see 3.5 /proc/<pid>/mountinfo
+for details].
+
+A typical output is::
+
+	pos:	0
+	flags:	0100002
+	mnt_id:	19
+
+All locks associated with a file descriptor are shown in its fdinfo too::
+
+    lock:       1: FLOCK  ADVISORY  WRITE 359 00:13:11691 0 EOF
+
+The files such as eventfd, fsnotify, signalfd, epoll among the regular pos/flags
+pair provide additional information particular to the objects they represent.
+
+Eventfd files
+~~~~~~~~~~~~~
+
+::
+
+	pos:	0
+	flags:	04002
+	mnt_id:	9
+	eventfd-count:	5a
+
+where 'eventfd-count' is hex value of a counter.
+
+Signalfd files
+~~~~~~~~~~~~~~
+
+::
+
+	pos:	0
+	flags:	04002
+	mnt_id:	9
+	sigmask:	0000000000000200
+
+where 'sigmask' is hex value of the signal mask associated
+with a file.
+
+Epoll files
+~~~~~~~~~~~
+
+::
+
+	pos:	0
+	flags:	02
+	mnt_id:	9
+	tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af sdev:7
+
+where 'tfd' is a target file descriptor number in decimal form,
+'events' is events mask being watched and the 'data' is data
+associated with a target [see epoll(7) for more details].
+
+The 'pos' is current offset of the target file in decimal form
+[see lseek(2)], 'ino' and 'sdev' are inode and device numbers
+where target file resides, all in hex format.
+
+Fsnotify files
+~~~~~~~~~~~~~~
+For inotify files the format is the following::
+
+	pos:	0
+	flags:	02000000
+	inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
+
+where 'wd' is a watch descriptor in decimal form, ie a target file
+descriptor number, 'ino' and 'sdev' are inode and device where the
+target file resides and the 'mask' is the mask of events, all in hex
+form [see inotify(7) for more details].
+
+If the kernel was built with exportfs support, the path to the target
+file is encoded as a file handle.  The file handle is provided by three
+fields 'fhandle-bytes', 'fhandle-type' and 'f_handle', all in hex
+format.
+
+If the kernel is built without exportfs support the file handle won't be
+printed out.
+
+If there is no inotify mark attached yet the 'inotify' line will be omitted.
+
+For fanotify files the format is::
+
+	pos:	0
+	flags:	02
+	mnt_id:	9
+	fanotify flags:10 event-flags:0
+	fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
+	fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
+
+where fanotify 'flags' and 'event-flags' are values used in fanotify_init
+call, 'mnt_id' is the mount point identifier, 'mflags' is the value of
+flags associated with mark which are tracked separately from events
+mask. 'ino', 'sdev' are target inode and device, 'mask' is the events
+mask and 'ignored_mask' is the mask of events which are to be ignored.
+All in hex format. Incorporation of 'mflags', 'mask' and 'ignored_mask'
+does provide information about flags and mask used in fanotify_mark
+call [see fsnotify manpage for details].
+
+While the first three lines are mandatory and always printed, the rest is
+optional and may be omitted if no marks created yet.
+
+Timerfd files
+~~~~~~~~~~~~~
+
+::
+
+	pos:	0
+	flags:	02
+	mnt_id:	9
+	clockid: 0
+	ticks: 0
+	settime flags: 01
+	it_value: (0, 49406829)
+	it_interval: (1, 0)
+
+where 'clockid' is the clock type and 'ticks' is the number of the timer expirations
+that have occurred [see timerfd_create(2) for details]. 'settime flags' are
+flags in octal form been used to setup the timer [see timerfd_settime(2) for
+details]. 'it_value' is remaining time until the timer exiration.
+'it_interval' is the interval for the timer. Note the timer might be set up
+with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value'
+still exhibits timer's remaining time.
+
+3.9	/proc/<pid>/map_files - Information about memory mapped files
+---------------------------------------------------------------------
+This directory contains symbolic links which represent memory mapped files
+the process is maintaining.  Example output::
+
+     | lr-------- 1 root root 64 Jan 27 11:24 333c600000-333c620000 -> /usr/lib64/ld-2.18.so
+     | lr-------- 1 root root 64 Jan 27 11:24 333c81f000-333c820000 -> /usr/lib64/ld-2.18.so
+     | lr-------- 1 root root 64 Jan 27 11:24 333c820000-333c821000 -> /usr/lib64/ld-2.18.so
+     | ...
+     | lr-------- 1 root root 64 Jan 27 11:24 35d0421000-35d0422000 -> /usr/lib64/libselinux.so.1
+     | lr-------- 1 root root 64 Jan 27 11:24 400000-41a000 -> /usr/bin/ls
+
+The name of a link represents the virtual memory bounds of a mapping, i.e.
+vm_area_struct::vm_start-vm_area_struct::vm_end.
+
+The main purpose of the map_files is to retrieve a set of memory mapped
+files in a fast way instead of parsing /proc/<pid>/maps or
+/proc/<pid>/smaps, both of which contain many more records.  At the same
+time one can open(2) mappings from the listings of two processes and
+comparing their inode numbers to figure out which anonymous memory areas
+are actually shared.
+
+3.10	/proc/<pid>/timerslack_ns - Task timerslack value
+---------------------------------------------------------
+This file provides the value of the task's timerslack value in nanoseconds.
+This value specifies a amount of time that normal timers may be deferred
+in order to coalesce timers and avoid unnecessary wakeups.
+
+This allows a task's interactivity vs power consumption trade off to be
+adjusted.
+
+Writing 0 to the file will set the tasks timerslack to the default value.
+
+Valid values are from 0 - ULLONG_MAX
+
+An application setting the value must have PTRACE_MODE_ATTACH_FSCREDS level
+permissions on the task specified to change its timerslack_ns value.
+
+3.11	/proc/<pid>/patch_state - Livepatch patch operation state
+-----------------------------------------------------------------
+When CONFIG_LIVEPATCH is enabled, this file displays the value of the
+patch state for the task.
+
+A value of '-1' indicates that no patch is in transition.
+
+A value of '0' indicates that a patch is in transition and the task is
+unpatched.  If the patch is being enabled, then the task hasn't been
+patched yet.  If the patch is being disabled, then the task has already
+been unpatched.
+
+A value of '1' indicates that a patch is in transition and the task is
+patched.  If the patch is being enabled, then the task has already been
+patched.  If the patch is being disabled, then the task hasn't been
+unpatched yet.
+
+3.12 /proc/<pid>/arch_status - task architecture specific status
+-------------------------------------------------------------------
+When CONFIG_PROC_PID_ARCH_STATUS is enabled, this file displays the
+architecture specific status of the task.
+
+Example
+~~~~~~~
+
+::
+
+ $ cat /proc/6753/arch_status
+ AVX512_elapsed_ms:      8
+
+Description
+~~~~~~~~~~~
+
+x86 specific entries:
+~~~~~~~~~~~~~~~~~~~~~
+
+AVX512_elapsed_ms:
+^^^^^^^^^^^^^^^^^^
+
+  If AVX512 is supported on the machine, this entry shows the milliseconds
+  elapsed since the last time AVX512 usage was recorded. The recording
+  happens on a best effort basis when a task is scheduled out. This means
+  that the value depends on two factors:
+
+    1) The time which the task spent on the CPU without being scheduled
+       out. With CPU isolation and a single runnable task this can take
+       several seconds.
+
+    2) The time since the task was scheduled out last. Depending on the
+       reason for being scheduled out (time slice exhausted, syscall ...)
+       this can be arbitrary long time.
+
+  As a consequence the value cannot be considered precise and authoritative
+  information. The application which uses this information has to be aware
+  of the overall scenario on the system in order to determine whether a
+  task is a real AVX512 user or not. Precise information can be obtained
+  with performance counters.
+
+  A special value of '-1' indicates that no AVX512 usage was recorded, thus
+  the task is unlikely an AVX512 user, but depends on the workload and the
+  scheduling scenario, it also could be a false negative mentioned above.
+
+Configuring procfs
+------------------
+
+4.1	Mount options
+---------------------
+
+The following mount options are supported:
+
+	=========	========================================================
+	hidepid=	Set /proc/<pid>/ access mode.
+	gid=		Set the group authorized to learn processes information.
+	=========	========================================================
+
+hidepid=0 means classic mode - everybody may access all /proc/<pid>/ directories
+(default).
+
+hidepid=1 means users may not access any /proc/<pid>/ directories but their
+own.  Sensitive files like cmdline, sched*, status are now protected against
+other users.  This makes it impossible to learn whether any user runs
+specific program (given the program doesn't reveal itself by its behaviour).
+As an additional bonus, as /proc/<pid>/cmdline is unaccessible for other users,
+poorly written programs passing sensitive information via program arguments are
+now protected against local eavesdroppers.
+
+hidepid=2 means hidepid=1 plus all /proc/<pid>/ will be fully invisible to other
+users.  It doesn't mean that it hides a fact whether a process with a specific
+pid value exists (it can be learned by other means, e.g. by "kill -0 $PID"),
+but it hides process' uid and gid, which may be learned by stat()'ing
+/proc/<pid>/ otherwise.  It greatly complicates an intruder's task of gathering
+information about running processes, whether some daemon runs with elevated
+privileges, whether other user runs some sensitive program, whether other users
+run any program at all, etc.
+
+gid= defines a group authorized to learn processes information otherwise
+prohibited by hidepid=.  If you use some daemon like identd which needs to learn
+information about processes information, just add identd to this group.
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
deleted file mode 100644
index 99ca040e3f90..000000000000
--- a/Documentation/filesystems/proc.txt
+++ /dev/null
@@ -1,2047 +0,0 @@
-------------------------------------------------------------------------------
-                       T H E  /proc   F I L E S Y S T E M
-------------------------------------------------------------------------------
-/proc/sys         Terrehon Bowden <terrehon@pacbell.net>        October 7 1999
-                  Bodo Bauer <bb@ricochet.net>
-
-2.4.x update	  Jorge Nerin <comandante@zaralinux.com>      November 14 2000
-move /proc/sys	  Shen Feng <shen@cn.fujitsu.com>		  April 1 2009
-------------------------------------------------------------------------------
-Version 1.3                                              Kernel version 2.2.12
-					      Kernel version 2.4.0-test11-pre4
-------------------------------------------------------------------------------
-fixes/update part 1.1  Stefani Seibold <stefani@seibold.net>       June 9 2009
-
-Table of Contents
------------------
-
-  0     Preface
-  0.1	Introduction/Credits
-  0.2	Legal Stuff
-
-  1	Collecting System Information
-  1.1	Process-Specific Subdirectories
-  1.2	Kernel data
-  1.3	IDE devices in /proc/ide
-  1.4	Networking info in /proc/net
-  1.5	SCSI info
-  1.6	Parallel port info in /proc/parport
-  1.7	TTY info in /proc/tty
-  1.8	Miscellaneous kernel statistics in /proc/stat
-  1.9	Ext4 file system parameters
-
-  2	Modifying System Parameters
-
-  3	Per-Process Parameters
-  3.1	/proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj - Adjust the oom-killer
-								score
-  3.2	/proc/<pid>/oom_score - Display current oom-killer score
-  3.3	/proc/<pid>/io - Display the IO accounting fields
-  3.4	/proc/<pid>/coredump_filter - Core dump filtering settings
-  3.5	/proc/<pid>/mountinfo - Information about mounts
-  3.6	/proc/<pid>/comm  & /proc/<pid>/task/<tid>/comm
-  3.7   /proc/<pid>/task/<tid>/children - Information about task children
-  3.8   /proc/<pid>/fdinfo/<fd> - Information about opened file
-  3.9   /proc/<pid>/map_files - Information about memory mapped files
-  3.10  /proc/<pid>/timerslack_ns - Task timerslack value
-  3.11	/proc/<pid>/patch_state - Livepatch patch operation state
-  3.12	/proc/<pid>/arch_status - Task architecture specific information
-
-  4	Configuring procfs
-  4.1	Mount options
-
-------------------------------------------------------------------------------
-Preface
-------------------------------------------------------------------------------
-
-0.1 Introduction/Credits
-------------------------
-
-This documentation is  part of a soon (or  so we hope) to be  released book on
-the SuSE  Linux distribution. As  there is  no complete documentation  for the
-/proc file system and we've used  many freely available sources to write these
-chapters, it  seems only fair  to give the work  back to the  Linux community.
-This work is  based on the 2.2.*  kernel version and the  upcoming 2.4.*. I'm
-afraid it's still far from complete, but we  hope it will be useful. As far as
-we know, it is the first 'all-in-one' document about the /proc file system. It
-is focused  on the Intel  x86 hardware,  so if you  are looking for  PPC, ARM,
-SPARC, AXP, etc., features, you probably  won't find what you are looking for.
-It also only covers IPv4 networking, not IPv6 nor other protocols - sorry. But
-additions and patches  are welcome and will  be added to this  document if you
-mail them to Bodo.
-
-We'd like  to  thank Alan Cox, Rik van Riel, and Alexey Kuznetsov and a lot of
-other people for help compiling this documentation. We'd also like to extend a
-special thank  you to Andi Kleen for documentation, which we relied on heavily
-to create  this  document,  as well as the additional information he provided.
-Thanks to  everybody  else  who contributed source or docs to the Linux kernel
-and helped create a great piece of software... :)
-
-If you  have  any comments, corrections or additions, please don't hesitate to
-contact Bodo  Bauer  at  bb@ricochet.net.  We'll  be happy to add them to this
-document.
-
-The   latest   version    of   this   document   is    available   online   at
-http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html
-
-If  the above  direction does  not works  for you,  you could  try the  kernel
-mailing  list  at  linux-kernel@vger.kernel.org  and/or try  to  reach  me  at
-comandante@zaralinux.com.
-
-0.2 Legal Stuff
----------------
-
-We don't  guarantee  the  correctness  of this document, and if you come to us
-complaining about  how  you  screwed  up  your  system  because  of  incorrect
-documentation, we won't feel responsible...
-
-------------------------------------------------------------------------------
-CHAPTER 1: COLLECTING SYSTEM INFORMATION
-------------------------------------------------------------------------------
-
-------------------------------------------------------------------------------
-In This Chapter
-------------------------------------------------------------------------------
-* Investigating  the  properties  of  the  pseudo  file  system  /proc and its
-  ability to provide information on the running Linux system
-* Examining /proc's structure
-* Uncovering  various  information  about the kernel and the processes running
-  on the system
-------------------------------------------------------------------------------
-
-
-The proc  file  system acts as an interface to internal data structures in the
-kernel. It  can  be  used to obtain information about the system and to change
-certain kernel parameters at runtime (sysctl).
-
-First, we'll  take  a  look  at the read-only parts of /proc. In Chapter 2, we
-show you how you can use /proc/sys to change settings.
-
-1.1 Process-Specific Subdirectories
------------------------------------
-
-The directory  /proc  contains  (among other things) one subdirectory for each
-process running on the system, which is named after the process ID (PID).
-
-The link  self  points  to  the  process reading the file system. Each process
-subdirectory has the entries listed in Table 1-1.
-
-Note that an open a file descriptor to /proc/<pid> or to any of its
-contained files or subdirectories does not prevent <pid> being reused
-for some other process in the event that <pid> exits. Operations on
-open /proc/<pid> file descriptors corresponding to dead processes
-never act on any new process that the kernel may, through chance, have
-also assigned the process ID <pid>. Instead, operations on these FDs
-usually fail with ESRCH.
-
-Table 1-1: Process specific entries in /proc
-..............................................................................
- File		Content
- clear_refs	Clears page referenced bits shown in smaps output
- cmdline	Command line arguments
- cpu		Current and last cpu in which it was executed	(2.4)(smp)
- cwd		Link to the current working directory
- environ	Values of environment variables
- exe		Link to the executable of this process
- fd		Directory, which contains all file descriptors
- maps		Memory maps to executables and library files	(2.4)
- mem		Memory held by this process
- root		Link to the root directory of this process
- stat		Process status
- statm		Process memory status information
- status		Process status in human readable form
- wchan		Present with CONFIG_KALLSYMS=y: it shows the kernel function
-		symbol the task is blocked in - or "0" if not blocked.
- pagemap	Page table
- stack		Report full stack trace, enable via CONFIG_STACKTRACE
- smaps		An extension based on maps, showing the memory consumption of
-		each mapping and flags associated with it
- smaps_rollup	Accumulated smaps stats for all mappings of the process.  This
-		can be derived from smaps, but is faster and more convenient
- numa_maps	An extension based on maps, showing the memory locality and
-		binding policy as well as mem usage (in pages) of each mapping.
-..............................................................................
-
-For example, to get the status information of a process, all you have to do is
-read the file /proc/PID/status:
-
-  >cat /proc/self/status
-  Name:   cat
-  State:  R (running)
-  Tgid:   5452
-  Pid:    5452
-  PPid:   743
-  TracerPid:      0						(2.4)
-  Uid:    501     501     501     501
-  Gid:    100     100     100     100
-  FDSize: 256
-  Groups: 100 14 16
-  VmPeak:     5004 kB
-  VmSize:     5004 kB
-  VmLck:         0 kB
-  VmHWM:       476 kB
-  VmRSS:       476 kB
-  RssAnon:             352 kB
-  RssFile:             120 kB
-  RssShmem:              4 kB
-  VmData:      156 kB
-  VmStk:        88 kB
-  VmExe:        68 kB
-  VmLib:      1412 kB
-  VmPTE:        20 kb
-  VmSwap:        0 kB
-  HugetlbPages:          0 kB
-  CoreDumping:    0
-  THP_enabled:	  1
-  Threads:        1
-  SigQ:   0/28578
-  SigPnd: 0000000000000000
-  ShdPnd: 0000000000000000
-  SigBlk: 0000000000000000
-  SigIgn: 0000000000000000
-  SigCgt: 0000000000000000
-  CapInh: 00000000fffffeff
-  CapPrm: 0000000000000000
-  CapEff: 0000000000000000
-  CapBnd: ffffffffffffffff
-  CapAmb: 0000000000000000
-  NoNewPrivs:     0
-  Seccomp:        0
-  Speculation_Store_Bypass:       thread vulnerable
-  voluntary_ctxt_switches:        0
-  nonvoluntary_ctxt_switches:     1
-
-This shows you nearly the same information you would get if you viewed it with
-the ps  command.  In  fact,  ps  uses  the  proc  file  system  to  obtain its
-information.  But you get a more detailed  view of the  process by reading the
-file /proc/PID/status. It fields are described in table 1-2.
-
-The  statm  file  contains  more  detailed  information about the process
-memory usage. Its seven fields are explained in Table 1-3.  The stat file
-contains details information about the process itself.  Its fields are
-explained in Table 1-4.
-
-(for SMP CONFIG users)
-For making accounting scalable, RSS related information are handled in an
-asynchronous manner and the value may not be very precise. To see a precise
-snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
-It's slow but very precise.
-
-Table 1-2: Contents of the status files (as of 4.19)
-..............................................................................
- Field                       Content
- Name                        filename of the executable
- Umask                       file mode creation mask
- State                       state (R is running, S is sleeping, D is sleeping
-                             in an uninterruptible wait, Z is zombie,
-			     T is traced or stopped)
- Tgid                        thread group ID
- Ngid                        NUMA group ID (0 if none)
- Pid                         process id
- PPid                        process id of the parent process
- TracerPid                   PID of process tracing this process (0 if not)
- Uid                         Real, effective, saved set, and  file system UIDs
- Gid                         Real, effective, saved set, and  file system GIDs
- FDSize                      number of file descriptor slots currently allocated
- Groups                      supplementary group list
- NStgid                      descendant namespace thread group ID hierarchy
- NSpid                       descendant namespace process ID hierarchy
- NSpgid                      descendant namespace process group ID hierarchy
- NSsid                       descendant namespace session ID hierarchy
- VmPeak                      peak virtual memory size
- VmSize                      total program size
- VmLck                       locked memory size
- VmPin                       pinned memory size
- VmHWM                       peak resident set size ("high water mark")
- VmRSS                       size of memory portions. It contains the three
-                             following parts (VmRSS = RssAnon + RssFile + RssShmem)
- RssAnon                     size of resident anonymous memory
- RssFile                     size of resident file mappings
- RssShmem                    size of resident shmem memory (includes SysV shm,
-                             mapping of tmpfs and shared anonymous mappings)
- VmData                      size of private data segments
- VmStk                       size of stack segments
- VmExe                       size of text segment
- VmLib                       size of shared library code
- VmPTE                       size of page table entries
- VmSwap                      amount of swap used by anonymous private data
-                             (shmem swap usage is not included)
- HugetlbPages                size of hugetlb memory portions
- CoreDumping                 process's memory is currently being dumped
-                             (killing the process may lead to a corrupted core)
- THP_enabled		     process is allowed to use THP (returns 0 when
-			     PR_SET_THP_DISABLE is set on the process
- Threads                     number of threads
- SigQ                        number of signals queued/max. number for queue
- SigPnd                      bitmap of pending signals for the thread
- ShdPnd                      bitmap of shared pending signals for the process
- SigBlk                      bitmap of blocked signals
- SigIgn                      bitmap of ignored signals
- SigCgt                      bitmap of caught signals
- CapInh                      bitmap of inheritable capabilities
- CapPrm                      bitmap of permitted capabilities
- CapEff                      bitmap of effective capabilities
- CapBnd                      bitmap of capabilities bounding set
- CapAmb                      bitmap of ambient capabilities
- NoNewPrivs                  no_new_privs, like prctl(PR_GET_NO_NEW_PRIV, ...)
- Seccomp                     seccomp mode, like prctl(PR_GET_SECCOMP, ...)
- Speculation_Store_Bypass    speculative store bypass mitigation status
- Cpus_allowed                mask of CPUs on which this process may run
- Cpus_allowed_list           Same as previous, but in "list format"
- Mems_allowed                mask of memory nodes allowed to this process
- Mems_allowed_list           Same as previous, but in "list format"
- voluntary_ctxt_switches     number of voluntary context switches
- nonvoluntary_ctxt_switches  number of non voluntary context switches
-..............................................................................
-
-Table 1-3: Contents of the statm files (as of 2.6.8-rc3)
-..............................................................................
- Field    Content
- size     total program size (pages)		(same as VmSize in status)
- resident size of memory portions (pages)	(same as VmRSS in status)
- shared   number of pages that are shared	(i.e. backed by a file, same
-						as RssFile+RssShmem in status)
- trs      number of pages that are 'code'	(not including libs; broken,
-							includes data segment)
- lrs      number of pages of library		(always 0 on 2.6)
- drs      number of pages of data/stack		(including libs; broken,
-							includes library text)
- dt       number of dirty pages			(always 0 on 2.6)
-..............................................................................
-
-
-Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
-..............................................................................
- Field          Content
-  pid           process id
-  tcomm         filename of the executable
-  state         state (R is running, S is sleeping, D is sleeping in an
-                uninterruptible wait, Z is zombie, T is traced or stopped)
-  ppid          process id of the parent process
-  pgrp          pgrp of the process
-  sid           session id
-  tty_nr        tty the process uses
-  tty_pgrp      pgrp of the tty
-  flags         task flags
-  min_flt       number of minor faults
-  cmin_flt      number of minor faults with child's
-  maj_flt       number of major faults
-  cmaj_flt      number of major faults with child's
-  utime         user mode jiffies
-  stime         kernel mode jiffies
-  cutime        user mode jiffies with child's
-  cstime        kernel mode jiffies with child's
-  priority      priority level
-  nice          nice level
-  num_threads   number of threads
-  it_real_value	(obsolete, always 0)
-  start_time    time the process started after system boot
-  vsize         virtual memory size
-  rss           resident set memory size
-  rsslim        current limit in bytes on the rss
-  start_code    address above which program text can run
-  end_code      address below which program text can run
-  start_stack   address of the start of the main process stack
-  esp           current value of ESP
-  eip           current value of EIP
-  pending       bitmap of pending signals
-  blocked       bitmap of blocked signals
-  sigign        bitmap of ignored signals
-  sigcatch      bitmap of caught signals
-  0		(place holder, used to be the wchan address, use /proc/PID/wchan instead)
-  0             (place holder)
-  0             (place holder)
-  exit_signal   signal to send to parent thread on exit
-  task_cpu      which CPU the task is scheduled on
-  rt_priority   realtime priority
-  policy        scheduling policy (man sched_setscheduler)
-  blkio_ticks   time spent waiting for block IO
-  gtime         guest time of the task in jiffies
-  cgtime        guest time of the task children in jiffies
-  start_data    address above which program data+bss is placed
-  end_data      address below which program data+bss is placed
-  start_brk     address above which program heap can be expanded with brk()
-  arg_start     address above which program command line is placed
-  arg_end       address below which program command line is placed
-  env_start     address above which program environment is placed
-  env_end       address below which program environment is placed
-  exit_code     the thread's exit_code in the form reported by the waitpid system call
-..............................................................................
-
-The /proc/PID/maps file contains the currently mapped memory regions and
-their access permissions.
-
-The format is:
-
-address           perms offset  dev   inode      pathname
-
-08048000-08049000 r-xp 00000000 03:00 8312       /opt/test
-08049000-0804a000 rw-p 00001000 03:00 8312       /opt/test
-0804a000-0806b000 rw-p 00000000 00:00 0          [heap]
-a7cb1000-a7cb2000 ---p 00000000 00:00 0
-a7cb2000-a7eb2000 rw-p 00000000 00:00 0
-a7eb2000-a7eb3000 ---p 00000000 00:00 0
-a7eb3000-a7ed5000 rw-p 00000000 00:00 0
-a7ed5000-a8008000 r-xp 00000000 03:00 4222       /lib/libc.so.6
-a8008000-a800a000 r--p 00133000 03:00 4222       /lib/libc.so.6
-a800a000-a800b000 rw-p 00135000 03:00 4222       /lib/libc.so.6
-a800b000-a800e000 rw-p 00000000 00:00 0
-a800e000-a8022000 r-xp 00000000 03:00 14462      /lib/libpthread.so.0
-a8022000-a8023000 r--p 00013000 03:00 14462      /lib/libpthread.so.0
-a8023000-a8024000 rw-p 00014000 03:00 14462      /lib/libpthread.so.0
-a8024000-a8027000 rw-p 00000000 00:00 0
-a8027000-a8043000 r-xp 00000000 03:00 8317       /lib/ld-linux.so.2
-a8043000-a8044000 r--p 0001b000 03:00 8317       /lib/ld-linux.so.2
-a8044000-a8045000 rw-p 0001c000 03:00 8317       /lib/ld-linux.so.2
-aff35000-aff4a000 rw-p 00000000 00:00 0          [stack]
-ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso]
-
-where "address" is the address space in the process that it occupies, "perms"
-is a set of permissions:
-
- r = read
- w = write
- x = execute
- s = shared
- p = private (copy on write)
-
-"offset" is the offset into the mapping, "dev" is the device (major:minor), and
-"inode" is the inode  on that device.  0 indicates that  no inode is associated
-with the memory region, as the case would be with BSS (uninitialized data).
-The "pathname" shows the name associated file for this mapping.  If the mapping
-is not associated with a file:
-
- [heap]                   = the heap of the program
- [stack]                  = the stack of the main process
- [vdso]                   = the "virtual dynamic shared object",
-                            the kernel system call handler
-
- or if empty, the mapping is anonymous.
-
-The /proc/PID/smaps is an extension based on maps, showing the memory
-consumption for each of the process's mappings. For each mapping (aka Virtual
-Memory Area, or VMA) there is a series of lines such as the following:
-
-08048000-080bc000 r-xp 00000000 03:02 13130      /bin/bash
-
-Size:               1084 kB
-KernelPageSize:        4 kB
-MMUPageSize:           4 kB
-Rss:                 892 kB
-Pss:                 374 kB
-Shared_Clean:        892 kB
-Shared_Dirty:          0 kB
-Private_Clean:         0 kB
-Private_Dirty:         0 kB
-Referenced:          892 kB
-Anonymous:             0 kB
-LazyFree:              0 kB
-AnonHugePages:         0 kB
-ShmemPmdMapped:        0 kB
-Shared_Hugetlb:        0 kB
-Private_Hugetlb:       0 kB
-Swap:                  0 kB
-SwapPss:               0 kB
-KernelPageSize:        4 kB
-MMUPageSize:           4 kB
-Locked:                0 kB
-THPeligible:           0
-VmFlags: rd ex mr mw me dw
-
-The first of these lines shows the same information as is displayed for the
-mapping in /proc/PID/maps.  Following lines show the size of the mapping
-(size); the size of each page allocated when backing a VMA (KernelPageSize),
-which is usually the same as the size in the page table entries; the page size
-used by the MMU when backing a VMA (in most cases, the same as KernelPageSize);
-the amount of the mapping that is currently resident in RAM (RSS); the
-process' proportional share of this mapping (PSS); and the number of clean and
-dirty shared and private pages in the mapping.
-
-The "proportional set size" (PSS) of a process is the count of pages it has
-in memory, where each page is divided by the number of processes sharing it.
-So if a process has 1000 pages all to itself, and 1000 shared with one other
-process, its PSS will be 1500.
-Note that even a page which is part of a MAP_SHARED mapping, but has only
-a single pte mapped, i.e.  is currently used by only one process, is accounted
-as private and not as shared.
-"Referenced" indicates the amount of memory currently marked as referenced or
-accessed.
-"Anonymous" shows the amount of memory that does not belong to any file.  Even
-a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
-and a page is modified, the file page is replaced by a private anonymous copy.
-"LazyFree" shows the amount of memory which is marked by madvise(MADV_FREE).
-The memory isn't freed immediately with madvise(). It's freed in memory
-pressure if the memory is clean. Please note that the printed value might
-be lower than the real value due to optimizations used in the current
-implementation. If this is not desirable please file a bug report.
-"AnonHugePages" shows the ammount of memory backed by transparent hugepage.
-"ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by
-huge pages.
-"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
-hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
-reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field.
-"Swap" shows how much would-be-anonymous memory is also used, but out on swap.
-For shmem mappings, "Swap" includes also the size of the mapped (and not
-replaced by copy-on-write) part of the underlying shmem object out on swap.
-"SwapPss" shows proportional swap share of this mapping. Unlike "Swap", this
-does not take into account swapped out page of underlying shmem objects.
-"Locked" indicates whether the mapping is locked in memory or not.
-"THPeligible" indicates whether the mapping is eligible for allocating THP
-pages - 1 if true, 0 otherwise. It just shows the current status.
-
-"VmFlags" field deserves a separate description. This member represents the kernel
-flags associated with the particular virtual memory area in two letter encoded
-manner. The codes are the following:
-    rd  - readable
-    wr  - writeable
-    ex  - executable
-    sh  - shared
-    mr  - may read
-    mw  - may write
-    me  - may execute
-    ms  - may share
-    gd  - stack segment growns down
-    pf  - pure PFN range
-    dw  - disabled write to the mapped file
-    lo  - pages are locked in memory
-    io  - memory mapped I/O area
-    sr  - sequential read advise provided
-    rr  - random read advise provided
-    dc  - do not copy area on fork
-    de  - do not expand area on remapping
-    ac  - area is accountable
-    nr  - swap space is not reserved for the area
-    ht  - area uses huge tlb pages
-    ar  - architecture specific flag
-    dd  - do not include area into core dump
-    sd  - soft-dirty flag
-    mm  - mixed map area
-    hg  - huge page advise flag
-    nh  - no-huge page advise flag
-    mg  - mergable advise flag
-
-Note that there is no guarantee that every flag and associated mnemonic will
-be present in all further kernel releases. Things get changed, the flags may
-be vanished or the reverse -- new added. Interpretation of their meaning
-might change in future as well. So each consumer of these flags has to
-follow each specific kernel version for the exact semantic.
-
-This file is only present if the CONFIG_MMU kernel configuration option is
-enabled.
-
-Note: reading /proc/PID/maps or /proc/PID/smaps is inherently racy (consistent
-output can be achieved only in the single read call).
-This typically manifests when doing partial reads of these files while the
-memory map is being modified.  Despite the races, we do provide the following
-guarantees:
-
-1) The mapped addresses never go backwards, which implies no two
-   regions will ever overlap.
-2) If there is something at a given vaddr during the entirety of the
-   life of the smaps/maps walk, there will be some output for it.
-
-The /proc/PID/smaps_rollup file includes the same fields as /proc/PID/smaps,
-but their values are the sums of the corresponding values for all mappings of
-the process.  Additionally, it contains these fields:
-
-Pss_Anon
-Pss_File
-Pss_Shmem
-
-They represent the proportional shares of anonymous, file, and shmem pages, as
-described for smaps above.  These fields are omitted in smaps since each
-mapping identifies the type (anon, file, or shmem) of all pages it contains.
-Thus all information in smaps_rollup can be derived from smaps, but at a
-significantly higher cost.
-
-The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG
-bits on both physical and virtual pages associated with a process, and the
-soft-dirty bit on pte (see Documentation/admin-guide/mm/soft-dirty.rst
-for details).
-To clear the bits for all the pages associated with the process
-    > echo 1 > /proc/PID/clear_refs
-
-To clear the bits for the anonymous pages associated with the process
-    > echo 2 > /proc/PID/clear_refs
-
-To clear the bits for the file mapped pages associated with the process
-    > echo 3 > /proc/PID/clear_refs
-
-To clear the soft-dirty bit
-    > echo 4 > /proc/PID/clear_refs
-
-To reset the peak resident set size ("high water mark") to the process's
-current value:
-    > echo 5 > /proc/PID/clear_refs
-
-Any other value written to /proc/PID/clear_refs will have no effect.
-
-The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags
-using /proc/kpageflags and number of times a page is mapped using
-/proc/kpagecount. For detailed explanation, see
-Documentation/admin-guide/mm/pagemap.rst.
-
-The /proc/pid/numa_maps is an extension based on maps, showing the memory
-locality and binding policy, as well as the memory usage (in pages) of
-each mapping. The output follows a general format where mapping details get
-summarized separated by blank spaces, one mapping per each file line:
-
-address   policy    mapping details
-
-00400000 default file=/usr/local/bin/app mapped=1 active=0 N3=1 kernelpagesize_kB=4
-00600000 default file=/usr/local/bin/app anon=1 dirty=1 N3=1 kernelpagesize_kB=4
-3206000000 default file=/lib64/ld-2.12.so mapped=26 mapmax=6 N0=24 N3=2 kernelpagesize_kB=4
-320621f000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4
-3206220000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4
-3206221000 default anon=1 dirty=1 N3=1 kernelpagesize_kB=4
-3206800000 default file=/lib64/libc-2.12.so mapped=59 mapmax=21 active=55 N0=41 N3=18 kernelpagesize_kB=4
-320698b000 default file=/lib64/libc-2.12.so
-3206b8a000 default file=/lib64/libc-2.12.so anon=2 dirty=2 N3=2 kernelpagesize_kB=4
-3206b8e000 default file=/lib64/libc-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4
-3206b8f000 default anon=3 dirty=3 active=1 N3=3 kernelpagesize_kB=4
-7f4dc10a2000 default anon=3 dirty=3 N3=3 kernelpagesize_kB=4
-7f4dc10b4000 default anon=2 dirty=2 active=1 N3=2 kernelpagesize_kB=4
-7f4dc1200000 default file=/anon_hugepage\040(deleted) huge anon=1 dirty=1 N3=1 kernelpagesize_kB=2048
-7fff335f0000 default stack anon=3 dirty=3 N3=3 kernelpagesize_kB=4
-7fff3369d000 default mapped=1 mapmax=35 active=0 N3=1 kernelpagesize_kB=4
-
-Where:
-"address" is the starting address for the mapping;
-"policy" reports the NUMA memory policy set for the mapping (see Documentation/admin-guide/mm/numa_memory_policy.rst);
-"mapping details" summarizes mapping data such as mapping type, page usage counters,
-node locality page counters (N0 == node0, N1 == node1, ...) and the kernel page
-size, in KB, that is backing the mapping up.
-
-1.2 Kernel data
----------------
-
-Similar to  the  process entries, the kernel data files give information about
-the running kernel. The files used to obtain this information are contained in
-/proc and  are  listed  in Table 1-5. Not all of these will be present in your
-system. It  depends  on the kernel configuration and the loaded modules, which
-files are there, and which are missing.
-
-Table 1-5: Kernel info in /proc
-..............................................................................
- File        Content                                           
- apm         Advanced power management info                    
- buddyinfo   Kernel memory allocator information (see text)	(2.5)
- bus         Directory containing bus specific information     
- cmdline     Kernel command line                               
- cpuinfo     Info about the CPU                                
- devices     Available devices (block and character)           
- dma         Used DMS channels                                 
- filesystems Supported filesystems                             
- driver	     Various drivers grouped here, currently rtc (2.4)
- execdomains Execdomains, related to security			(2.4)
- fb	     Frame Buffer devices				(2.4)
- fs	     File system parameters, currently nfs/exports	(2.4)
- ide         Directory containing info about the IDE subsystem 
- interrupts  Interrupt usage                                   
- iomem	     Memory map						(2.4)
- ioports     I/O port usage                                    
- irq	     Masks for irq to cpu affinity			(2.4)(smp?)
- isapnp	     ISA PnP (Plug&Play) Info				(2.4)
- kcore       Kernel core image (can be ELF or A.OUT(deprecated in 2.4))   
- kmsg        Kernel messages                                   
- ksyms       Kernel symbol table                               
- loadavg     Load average of last 1, 5 & 15 minutes                
- locks       Kernel locks                                      
- meminfo     Memory info                                       
- misc        Miscellaneous                                     
- modules     List of loaded modules                            
- mounts      Mounted filesystems                               
- net         Networking info (see text)                        
- pagetypeinfo Additional page allocator information (see text)  (2.5)
- partitions  Table of partitions known to the system           
- pci	     Deprecated info of PCI bus (new way -> /proc/bus/pci/,
-             decoupled by lspci					(2.4)
- rtc         Real time clock                                   
- scsi        SCSI info (see text)                              
- slabinfo    Slab pool info                                    
- softirqs    softirq usage
- stat        Overall statistics                                
- swaps       Swap space utilization                            
- sys         See chapter 2                                     
- sysvipc     Info of SysVIPC Resources (msg, sem, shm)		(2.4)
- tty	     Info of tty drivers
- uptime      Wall clock since boot, combined idle time of all cpus
- version     Kernel version                                    
- video	     bttv info of video resources			(2.4)
- vmallocinfo Show vmalloced areas
-..............................................................................
-
-You can,  for  example,  check  which interrupts are currently in use and what
-they are used for by looking in the file /proc/interrupts:
-
-  > cat /proc/interrupts 
-             CPU0        
-    0:    8728810          XT-PIC  timer 
-    1:        895          XT-PIC  keyboard 
-    2:          0          XT-PIC  cascade 
-    3:     531695          XT-PIC  aha152x 
-    4:    2014133          XT-PIC  serial 
-    5:      44401          XT-PIC  pcnet_cs 
-    8:          2          XT-PIC  rtc 
-   11:          8          XT-PIC  i82365 
-   12:     182918          XT-PIC  PS/2 Mouse 
-   13:          1          XT-PIC  fpu 
-   14:    1232265          XT-PIC  ide0 
-   15:          7          XT-PIC  ide1 
-  NMI:          0 
-
-In 2.4.* a couple of lines where added to this file LOC & ERR (this time is the
-output of a SMP machine):
-
-  > cat /proc/interrupts 
-
-             CPU0       CPU1       
-    0:    1243498    1214548    IO-APIC-edge  timer
-    1:       8949       8958    IO-APIC-edge  keyboard
-    2:          0          0          XT-PIC  cascade
-    5:      11286      10161    IO-APIC-edge  soundblaster
-    8:          1          0    IO-APIC-edge  rtc
-    9:      27422      27407    IO-APIC-edge  3c503
-   12:     113645     113873    IO-APIC-edge  PS/2 Mouse
-   13:          0          0          XT-PIC  fpu
-   14:      22491      24012    IO-APIC-edge  ide0
-   15:       2183       2415    IO-APIC-edge  ide1
-   17:      30564      30414   IO-APIC-level  eth0
-   18:        177        164   IO-APIC-level  bttv
-  NMI:    2457961    2457959 
-  LOC:    2457882    2457881 
-  ERR:       2155
-
-NMI is incremented in this case because every timer interrupt generates a NMI
-(Non Maskable Interrupt) which is used by the NMI Watchdog to detect lockups.
-
-LOC is the local interrupt counter of the internal APIC of every CPU.
-
-ERR is incremented in the case of errors in the IO-APIC bus (the bus that
-connects the CPUs in a SMP system. This means that an error has been detected,
-the IO-APIC automatically retry the transmission, so it should not be a big
-problem, but you should read the SMP-FAQ.
-
-In 2.6.2* /proc/interrupts was expanded again.  This time the goal was for
-/proc/interrupts to display every IRQ vector in use by the system, not
-just those considered 'most important'.  The new vectors are:
-
-  THR -- interrupt raised when a machine check threshold counter
-  (typically counting ECC corrected errors of memory or cache) exceeds
-  a configurable threshold.  Only available on some systems.
-
-  TRM -- a thermal event interrupt occurs when a temperature threshold
-  has been exceeded for the CPU.  This interrupt may also be generated
-  when the temperature drops back to normal.
-
-  SPU -- a spurious interrupt is some interrupt that was raised then lowered
-  by some IO device before it could be fully processed by the APIC.  Hence
-  the APIC sees the interrupt but does not know what device it came from.
-  For this case the APIC will generate the interrupt with a IRQ vector
-  of 0xff. This might also be generated by chipset bugs.
-
-  RES, CAL, TLB -- rescheduling, call and TLB flush interrupts are
-  sent from one CPU to another per the needs of the OS.  Typically,
-  their statistics are used by kernel developers and interested users to
-  determine the occurrence of interrupts of the given type.
-
-The above IRQ vectors are displayed only when relevant.  For example,
-the threshold vector does not exist on x86_64 platforms.  Others are
-suppressed when the system is a uniprocessor.  As of this writing, only
-i386 and x86_64 platforms support the new IRQ vector displays.
-
-Of some interest is the introduction of the /proc/irq directory to 2.4.
-It could be used to set IRQ to CPU affinity, this means that you can "hook" an
-IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the
-irq subdir is one subdir for each IRQ, and two files; default_smp_affinity and
-prof_cpu_mask.
-
-For example 
-  > ls /proc/irq/
-  0  10  12  14  16  18  2  4  6  8  prof_cpu_mask
-  1  11  13  15  17  19  3  5  7  9  default_smp_affinity
-  > ls /proc/irq/0/
-  smp_affinity
-
-smp_affinity is a bitmask, in which you can specify which CPUs can handle the
-IRQ, you can set it by doing:
-
-  > echo 1 > /proc/irq/10/smp_affinity
-
-This means that only the first CPU will handle the IRQ, but you can also echo
-5 which means that only the first and third CPU can handle the IRQ.
-
-The contents of each smp_affinity file is the same by default:
-
-  > cat /proc/irq/0/smp_affinity
-  ffffffff
-
-There is an alternate interface, smp_affinity_list which allows specifying
-a cpu range instead of a bitmask:
-
-  > cat /proc/irq/0/smp_affinity_list
-  1024-1031
-
-The default_smp_affinity mask applies to all non-active IRQs, which are the
-IRQs which have not yet been allocated/activated, and hence which lack a
-/proc/irq/[0-9]* directory.
-
-The node file on an SMP system shows the node to which the device using the IRQ
-reports itself as being attached. This hardware locality information does not
-include information about any possible driver locality preference.
-
-prof_cpu_mask specifies which CPUs are to be profiled by the system wide
-profiler. Default value is ffffffff (all cpus if there are only 32 of them).
-
-The way IRQs are routed is handled by the IO-APIC, and it's Round Robin
-between all the CPUs which are allowed to handle it. As usual the kernel has
-more info than you and does a better job than you, so the defaults are the
-best choice for almost everyone.  [Note this applies only to those IO-APIC's
-that support "Round Robin" interrupt distribution.]
-
-There are  three  more  important subdirectories in /proc: net, scsi, and sys.
-The general  rule  is  that  the  contents,  or  even  the  existence of these
-directories, depend  on your kernel configuration. If SCSI is not enabled, the
-directory scsi  may  not  exist. The same is true with the net, which is there
-only when networking support is present in the running kernel.
-
-The slabinfo  file  gives  information  about  memory usage at the slab level.
-Linux uses  slab  pools for memory management above page level in version 2.2.
-Commonly used  objects  have  their  own  slab  pool (such as network buffers,
-directory cache, and so on).
-
-..............................................................................
-
-> cat /proc/buddyinfo
-
-Node 0, zone      DMA      0      4      5      4      4      3 ...
-Node 0, zone   Normal      1      0      0      1    101      8 ...
-Node 0, zone  HighMem      2      0      0      1      1      0 ...
-
-External fragmentation is a problem under some workloads, and buddyinfo is a
-useful tool for helping diagnose these problems.  Buddyinfo will give you a 
-clue as to how big an area you can safely allocate, or why a previous
-allocation failed.
-
-Each column represents the number of pages of a certain order which are 
-available.  In this case, there are 0 chunks of 2^0*PAGE_SIZE available in 
-ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE 
-available in ZONE_NORMAL, etc... 
-
-More information relevant to external fragmentation can be found in
-pagetypeinfo.
-
-> cat /proc/pagetypeinfo
-Page block order: 9
-Pages per block:  512
-
-Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
-Node    0, zone      DMA, type    Unmovable      0      0      0      1      1      1      1      1      1      1      0
-Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
-Node    0, zone      DMA, type      Movable      1      1      2      1      2      1      1      0      1      0      2
-Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      1      0
-Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
-Node    0, zone    DMA32, type    Unmovable    103     54     77      1      1      1     11      8      7      1      9
-Node    0, zone    DMA32, type  Reclaimable      0      0      2      1      0      0      0      0      1      0      0
-Node    0, zone    DMA32, type      Movable    169    152    113     91     77     54     39     13      6      1    452
-Node    0, zone    DMA32, type      Reserve      1      2      2      2      2      0      1      1      1      1      0
-Node    0, zone    DMA32, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
-
-Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate
-Node 0, zone      DMA            2            0            5            1            0
-Node 0, zone    DMA32           41            6          967            2            0
-
-Fragmentation avoidance in the kernel works by grouping pages of different
-migrate types into the same contiguous regions of memory called page blocks.
-A page block is typically the size of the default hugepage size e.g. 2MB on
-X86-64. By keeping pages grouped based on their ability to move, the kernel
-can reclaim pages within a page block to satisfy a high-order allocation.
-
-The pagetypinfo begins with information on the size of a page block. It
-then gives the same type of information as buddyinfo except broken down
-by migrate-type and finishes with details on how many page blocks of each
-type exist.
-
-If min_free_kbytes has been tuned correctly (recommendations made by hugeadm
-from libhugetlbfs https://github.com/libhugetlbfs/libhugetlbfs/), one can
-make an estimate of the likely number of huge pages that can be allocated
-at a given point in time. All the "Movable" blocks should be allocatable
-unless memory has been mlock()'d. Some of the Reclaimable blocks should
-also be allocatable although a lot of filesystem metadata may have to be
-reclaimed to achieve this.
-
-..............................................................................
-
-meminfo:
-
-Provides information about distribution and utilization of memory.  This
-varies by architecture and compile options.  The following is from a
-16GB PIII, which has highmem enabled.  You may not have all of these fields.
-
-> cat /proc/meminfo
-
-MemTotal:     16344972 kB
-MemFree:      13634064 kB
-MemAvailable: 14836172 kB
-Buffers:          3656 kB
-Cached:        1195708 kB
-SwapCached:          0 kB
-Active:         891636 kB
-Inactive:      1077224 kB
-HighTotal:    15597528 kB
-HighFree:     13629632 kB
-LowTotal:       747444 kB
-LowFree:          4432 kB
-SwapTotal:           0 kB
-SwapFree:            0 kB
-Dirty:             968 kB
-Writeback:           0 kB
-AnonPages:      861800 kB
-Mapped:         280372 kB
-Shmem:             644 kB
-KReclaimable:   168048 kB
-Slab:           284364 kB
-SReclaimable:   159856 kB
-SUnreclaim:     124508 kB
-PageTables:      24448 kB
-NFS_Unstable:        0 kB
-Bounce:              0 kB
-WritebackTmp:        0 kB
-CommitLimit:   7669796 kB
-Committed_AS:   100056 kB
-VmallocTotal:   112216 kB
-VmallocUsed:       428 kB
-VmallocChunk:   111088 kB
-Percpu:          62080 kB
-HardwareCorrupted:   0 kB
-AnonHugePages:   49152 kB
-ShmemHugePages:      0 kB
-ShmemPmdMapped:      0 kB
-
-
-    MemTotal: Total usable ram (i.e. physical ram minus a few reserved
-              bits and the kernel binary code)
-     MemFree: The sum of LowFree+HighFree
-MemAvailable: An estimate of how much memory is available for starting new
-              applications, without swapping. Calculated from MemFree,
-              SReclaimable, the size of the file LRU lists, and the low
-              watermarks in each zone.
-              The estimate takes into account that the system needs some
-              page cache to function well, and that not all reclaimable
-              slab will be reclaimable, due to items being in use. The
-              impact of those factors will vary from system to system.
-     Buffers: Relatively temporary storage for raw disk blocks
-              shouldn't get tremendously large (20MB or so)
-      Cached: in-memory cache for files read from the disk (the
-              pagecache).  Doesn't include SwapCached
-  SwapCached: Memory that once was swapped out, is swapped back in but
-              still also is in the swapfile (if memory is needed it
-              doesn't need to be swapped out AGAIN because it is already
-              in the swapfile. This saves I/O)
-      Active: Memory that has been used more recently and usually not
-              reclaimed unless absolutely necessary.
-    Inactive: Memory which has been less recently used.  It is more
-              eligible to be reclaimed for other purposes
-   HighTotal:
-    HighFree: Highmem is all memory above ~860MB of physical memory
-              Highmem areas are for use by userspace programs, or
-              for the pagecache.  The kernel must use tricks to access
-              this memory, making it slower to access than lowmem.
-    LowTotal:
-     LowFree: Lowmem is memory which can be used for everything that
-              highmem can be used for, but it is also available for the
-              kernel's use for its own data structures.  Among many
-              other things, it is where everything from the Slab is
-              allocated.  Bad things happen when you're out of lowmem.
-   SwapTotal: total amount of swap space available
-    SwapFree: Memory which has been evicted from RAM, and is temporarily
-              on the disk
-       Dirty: Memory which is waiting to get written back to the disk
-   Writeback: Memory which is actively being written back to the disk
-   AnonPages: Non-file backed pages mapped into userspace page tables
-HardwareCorrupted: The amount of RAM/memory in KB, the kernel identifies as
-	      corrupted.
-AnonHugePages: Non-file backed huge pages mapped into userspace page tables
-      Mapped: files which have been mmaped, such as libraries
-       Shmem: Total memory used by shared memory (shmem) and tmpfs
-ShmemHugePages: Memory used by shared memory (shmem) and tmpfs allocated
-              with huge pages
-ShmemPmdMapped: Shared memory mapped into userspace with huge pages
-KReclaimable: Kernel allocations that the kernel will attempt to reclaim
-              under memory pressure. Includes SReclaimable (below), and other
-              direct allocations with a shrinker.
-        Slab: in-kernel data structures cache
-SReclaimable: Part of Slab, that might be reclaimed, such as caches
-  SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure
-  PageTables: amount of memory dedicated to the lowest level of page
-              tables.
-NFS_Unstable: NFS pages sent to the server, but not yet committed to stable
-	      storage
-      Bounce: Memory used for block device "bounce buffers"
-WritebackTmp: Memory used by FUSE for temporary writeback buffers
- CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'),
-              this is the total amount of  memory currently available to
-              be allocated on the system. This limit is only adhered to
-              if strict overcommit accounting is enabled (mode 2 in
-              'vm.overcommit_memory').
-              The CommitLimit is calculated with the following formula:
-              CommitLimit = ([total RAM pages] - [total huge TLB pages]) *
-                             overcommit_ratio / 100 + [total swap pages]
-              For example, on a system with 1G of physical RAM and 7G
-              of swap with a `vm.overcommit_ratio` of 30 it would
-              yield a CommitLimit of 7.3G.
-              For more details, see the memory overcommit documentation
-              in vm/overcommit-accounting.
-Committed_AS: The amount of memory presently allocated on the system.
-              The committed memory is a sum of all of the memory which
-              has been allocated by processes, even if it has not been
-              "used" by them as of yet. A process which malloc()'s 1G
-              of memory, but only touches 300M of it will show up as
-	      using 1G. This 1G is memory which has been "committed" to
-              by the VM and can be used at any time by the allocating
-              application. With strict overcommit enabled on the system
-              (mode 2 in 'vm.overcommit_memory'),allocations which would
-              exceed the CommitLimit (detailed above) will not be permitted.
-              This is useful if one needs to guarantee that processes will
-              not fail due to lack of memory once that memory has been
-              successfully allocated.
-VmallocTotal: total size of vmalloc memory area
- VmallocUsed: amount of vmalloc area which is used
-VmallocChunk: largest contiguous block of vmalloc area which is free
-      Percpu: Memory allocated to the percpu allocator used to back percpu
-              allocations. This stat excludes the cost of metadata.
-
-..............................................................................
-
-vmallocinfo:
-
-Provides information about vmalloced/vmaped areas. One line per area,
-containing the virtual address range of the area, size in bytes,
-caller information of the creator, and optional information depending
-on the kind of area :
-
- pages=nr    number of pages
- phys=addr   if a physical address was specified
- ioremap     I/O mapping (ioremap() and friends)
- vmalloc     vmalloc() area
- vmap        vmap()ed pages
- user        VM_USERMAP area
- vpages      buffer for pages pointers was vmalloced (huge area)
- N<node>=nr  (Only on NUMA kernels)
-             Number of pages allocated on memory node <node>
-
-> cat /proc/vmallocinfo
-0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204 ...
-  /0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128
-0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204 ...
-  /0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
-0xffffc20000302000-0xffffc20000304000    8192 acpi_tb_verify_table+0x21/0x4f...
-  phys=7fee8000 ioremap
-0xffffc20000304000-0xffffc20000307000   12288 acpi_tb_verify_table+0x21/0x4f...
-  phys=7fee7000 ioremap
-0xffffc2000031d000-0xffffc2000031f000    8192 init_vdso_vars+0x112/0x210
-0xffffc2000031f000-0xffffc2000032b000   49152 cramfs_uncompress_init+0x2e ...
-  /0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3
-0xffffc2000033a000-0xffffc2000033d000   12288 sys_swapon+0x640/0xac0      ...
-  pages=2 vmalloc N1=2
-0xffffc20000347000-0xffffc2000034c000   20480 xt_alloc_table_info+0xfe ...
-  /0x130 [x_tables] pages=4 vmalloc N0=4
-0xffffffffa0000000-0xffffffffa000f000   61440 sys_init_module+0xc27/0x1d00 ...
-   pages=14 vmalloc N2=14
-0xffffffffa000f000-0xffffffffa0014000   20480 sys_init_module+0xc27/0x1d00 ...
-   pages=4 vmalloc N1=4
-0xffffffffa0014000-0xffffffffa0017000   12288 sys_init_module+0xc27/0x1d00 ...
-   pages=2 vmalloc N1=2
-0xffffffffa0017000-0xffffffffa0022000   45056 sys_init_module+0xc27/0x1d00 ...
-   pages=10 vmalloc N0=10
-
-..............................................................................
-
-softirqs:
-
-Provides counts of softirq handlers serviced since boot time, for each cpu.
-
-> cat /proc/softirqs
-                CPU0       CPU1       CPU2       CPU3
-      HI:          0          0          0          0
-   TIMER:      27166      27120      27097      27034
-  NET_TX:          0          0          0         17
-  NET_RX:         42          0          0         39
-   BLOCK:          0          0        107       1121
- TASKLET:          0          0          0        290
-   SCHED:      27035      26983      26971      26746
- HRTIMER:          0          0          0          0
-     RCU:       1678       1769       2178       2250
-
-
-1.3 IDE devices in /proc/ide
-----------------------------
-
-The subdirectory /proc/ide contains information about all IDE devices of which
-the kernel  is  aware.  There is one subdirectory for each IDE controller, the
-file drivers  and a link for each IDE device, pointing to the device directory
-in the controller specific subtree.
-
-The file  drivers  contains general information about the drivers used for the
-IDE devices:
-
-  > cat /proc/ide/drivers
-  ide-cdrom version 4.53
-  ide-disk version 1.08
-
-More detailed  information  can  be  found  in  the  controller  specific
-subdirectories. These  are  named  ide0,  ide1  and  so  on.  Each  of  these
-directories contains the files shown in table 1-6.
-
-
-Table 1-6: IDE controller info in  /proc/ide/ide?
-..............................................................................
- File    Content                                 
- channel IDE channel (0 or 1)                    
- config  Configuration (only for PCI/IDE bridge) 
- mate    Mate name                               
- model   Type/Chipset of IDE controller          
-..............................................................................
-
-Each device  connected  to  a  controller  has  a separate subdirectory in the
-controllers directory.  The  files  listed in table 1-7 are contained in these
-directories.
-
-
-Table 1-7: IDE device information
-..............................................................................
- File             Content                                    
- cache            The cache                                  
- capacity         Capacity of the medium (in 512Byte blocks) 
- driver           driver and version                         
- geometry         physical and logical geometry              
- identify         device identify block                      
- media            media type                                 
- model            device identifier                          
- settings         device setup                               
- smart_thresholds IDE disk management thresholds             
- smart_values     IDE disk management values                 
-..............................................................................
-
-The most  interesting  file is settings. This file contains a nice overview of
-the drive parameters:
-
-  # cat /proc/ide/ide0/hda/settings 
-  name                    value           min             max             mode 
-  ----                    -----           ---             ---             ---- 
-  bios_cyl                526             0               65535           rw 
-  bios_head               255             0               255             rw 
-  bios_sect               63              0               63              rw 
-  breada_readahead        4               0               127             rw 
-  bswap                   0               0               1               r 
-  file_readahead          72              0               2097151         rw 
-  io_32bit                0               0               3               rw 
-  keepsettings            0               0               1               rw 
-  max_kb_per_request      122             1               127             rw 
-  multcount               0               0               8               rw 
-  nice1                   1               0               1               rw 
-  nowerr                  0               0               1               rw 
-  pio_mode                write-only      0               255             w 
-  slow                    0               0               1               rw 
-  unmaskirq               0               0               1               rw 
-  using_dma               0               0               1               rw 
-
-
-1.4 Networking info in /proc/net
---------------------------------
-
-The subdirectory  /proc/net  follows  the  usual  pattern. Table 1-8 shows the
-additional values  you  get  for  IP  version 6 if you configure the kernel to
-support this. Table 1-9 lists the files and their meaning.
-
-
-Table 1-8: IPv6 info in /proc/net
-..............................................................................
- File       Content                                               
- udp6       UDP sockets (IPv6)                                    
- tcp6       TCP sockets (IPv6)                                    
- raw6       Raw device statistics (IPv6)                          
- igmp6      IP multicast addresses, which this host joined (IPv6) 
- if_inet6   List of IPv6 interface addresses                      
- ipv6_route Kernel routing table for IPv6                         
- rt6_stats  Global IPv6 routing tables statistics                 
- sockstat6  Socket statistics (IPv6)                              
- snmp6      Snmp data (IPv6)                                      
-..............................................................................
-
-
-Table 1-9: Network info in /proc/net
-..............................................................................
- File          Content                                                         
- arp           Kernel  ARP table                                               
- dev           network devices with statistics                                 
- dev_mcast     the Layer2 multicast groups a device is listening too
-               (interface index, label, number of references, number of bound
-               addresses). 
- dev_stat      network device status                                           
- ip_fwchains   Firewall chain linkage                                          
- ip_fwnames    Firewall chain names                                            
- ip_masq       Directory containing the masquerading tables                    
- ip_masquerade Major masquerading table                                        
- netstat       Network statistics                                              
- raw           raw device statistics                                           
- route         Kernel routing table                                            
- rpc           Directory containing rpc info                                   
- rt_cache      Routing cache                                                   
- snmp          SNMP data                                                       
- sockstat      Socket statistics                                               
- tcp           TCP  sockets                                                    
- udp           UDP sockets                                                     
- unix          UNIX domain sockets                                             
- wireless      Wireless interface data (Wavelan etc)                           
- igmp          IP multicast addresses, which this host joined                  
- psched        Global packet scheduler parameters.                             
- netlink       List of PF_NETLINK sockets                                      
- ip_mr_vifs    List of multicast virtual interfaces                            
- ip_mr_cache   List of multicast routing cache                                 
-..............................................................................
-
-You can  use  this  information  to see which network devices are available in
-your system and how much traffic was routed over those devices:
-
-  > cat /proc/net/dev 
-  Inter-|Receive                                                   |[... 
-   face |bytes    packets errs drop fifo frame compressed multicast|[... 
-      lo:  908188   5596     0    0    0     0          0         0 [...         
-    ppp0:15475140  20721   410    0    0   410          0         0 [...  
-    eth0:  614530   7085     0    0    0     0          0         1 [... 
-   
-  ...] Transmit 
-  ...] bytes    packets errs drop fifo colls carrier compressed 
-  ...]  908188     5596    0    0    0     0       0          0 
-  ...] 1375103    17405    0    0    0     0       0          0 
-  ...] 1703981     5535    0    0    0     3       0          0 
-
-In addition, each Channel Bond interface has its own directory.  For
-example, the bond0 device will have a directory called /proc/net/bond0/.
-It will contain information that is specific to that bond, such as the
-current slaves of the bond, the link status of the slaves, and how
-many times the slaves link has failed.
-
-1.5 SCSI info
--------------
-
-If you  have  a  SCSI  host adapter in your system, you'll find a subdirectory
-named after  the driver for this adapter in /proc/scsi. You'll also see a list
-of all recognized SCSI devices in /proc/scsi:
-
-  >cat /proc/scsi/scsi 
-  Attached devices: 
-  Host: scsi0 Channel: 00 Id: 00 Lun: 00 
-    Vendor: IBM      Model: DGHS09U          Rev: 03E0 
-    Type:   Direct-Access                    ANSI SCSI revision: 03 
-  Host: scsi0 Channel: 00 Id: 06 Lun: 00 
-    Vendor: PIONEER  Model: CD-ROM DR-U06S   Rev: 1.04 
-    Type:   CD-ROM                           ANSI SCSI revision: 02 
-
-
-The directory  named  after  the driver has one file for each adapter found in
-the system.  These  files  contain information about the controller, including
-the used  IRQ  and  the  IO  address range. The amount of information shown is
-dependent on  the adapter you use. The example shows the output for an Adaptec
-AHA-2940 SCSI adapter:
-
-  > cat /proc/scsi/aic7xxx/0 
-   
-  Adaptec AIC7xxx driver version: 5.1.19/3.2.4 
-  Compile Options: 
-    TCQ Enabled By Default : Disabled 
-    AIC7XXX_PROC_STATS     : Disabled 
-    AIC7XXX_RESET_DELAY    : 5 
-  Adapter Configuration: 
-             SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter 
-                             Ultra Wide Controller 
-      PCI MMAPed I/O Base: 0xeb001000 
-   Adapter SEEPROM Config: SEEPROM found and used. 
-        Adaptec SCSI BIOS: Enabled 
-                      IRQ: 10 
-                     SCBs: Active 0, Max Active 2, 
-                           Allocated 15, HW 16, Page 255 
-               Interrupts: 160328 
-        BIOS Control Word: 0x18b6 
-     Adapter Control Word: 0x005b 
-     Extended Translation: Enabled 
-  Disconnect Enable Flags: 0xffff 
-       Ultra Enable Flags: 0x0001 
-   Tag Queue Enable Flags: 0x0000 
-  Ordered Queue Tag Flags: 0x0000 
-  Default Tag Queue Depth: 8 
-      Tagged Queue By Device array for aic7xxx host instance 0: 
-        {255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255} 
-      Actual queue depth per device for aic7xxx host instance 0: 
-        {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1} 
-  Statistics: 
-  (scsi0:0:0:0) 
-    Device using Wide/Sync transfers at 40.0 MByte/sec, offset 8 
-    Transinfo settings: current(12/8/1/0), goal(12/8/1/0), user(12/15/1/0) 
-    Total transfers 160151 (74577 reads and 85574 writes) 
-  (scsi0:0:6:0) 
-    Device using Narrow/Sync transfers at 5.0 MByte/sec, offset 15 
-    Transinfo settings: current(50/15/0/0), goal(50/15/0/0), user(50/15/0/0) 
-    Total transfers 0 (0 reads and 0 writes) 
-
-
-1.6 Parallel port info in /proc/parport
----------------------------------------
-
-The directory  /proc/parport  contains information about the parallel ports of
-your system.  It  has  one  subdirectory  for  each port, named after the port
-number (0,1,2,...).
-
-These directories contain the four files shown in Table 1-10.
-
-
-Table 1-10: Files in /proc/parport
-..............................................................................
- File      Content                                                             
- autoprobe Any IEEE-1284 device ID information that has been acquired.         
- devices   list of the device drivers using that port. A + will appear by the
-           name of the device currently using the port (it might not appear
-           against any). 
- hardware  Parallel port's base address, IRQ line and DMA channel.             
- irq       IRQ that parport is using for that port. This is in a separate
-           file to allow you to alter it by writing a new value in (IRQ
-           number or none). 
-..............................................................................
-
-1.7 TTY info in /proc/tty
--------------------------
-
-Information about  the  available  and actually used tty's can be found in the
-directory /proc/tty.You'll  find  entries  for drivers and line disciplines in
-this directory, as shown in Table 1-11.
-
-
-Table 1-11: Files in /proc/tty
-..............................................................................
- File          Content                                        
- drivers       list of drivers and their usage                
- ldiscs        registered line disciplines                    
- driver/serial usage statistic and status of single tty lines 
-..............................................................................
-
-To see  which  tty's  are  currently in use, you can simply look into the file
-/proc/tty/drivers:
-
-  > cat /proc/tty/drivers 
-  pty_slave            /dev/pts      136   0-255 pty:slave 
-  pty_master           /dev/ptm      128   0-255 pty:master 
-  pty_slave            /dev/ttyp       3   0-255 pty:slave 
-  pty_master           /dev/pty        2   0-255 pty:master 
-  serial               /dev/cua        5   64-67 serial:callout 
-  serial               /dev/ttyS       4   64-67 serial 
-  /dev/tty0            /dev/tty0       4       0 system:vtmaster 
-  /dev/ptmx            /dev/ptmx       5       2 system 
-  /dev/console         /dev/console    5       1 system:console 
-  /dev/tty             /dev/tty        5       0 system:/dev/tty 
-  unknown              /dev/tty        4    1-63 console 
-
-
-1.8 Miscellaneous kernel statistics in /proc/stat
--------------------------------------------------
-
-Various pieces   of  information about  kernel activity  are  available in the
-/proc/stat file.  All  of  the numbers reported  in  this file are  aggregates
-since the system first booted.  For a quick look, simply cat the file:
-
-  > cat /proc/stat
-  cpu  2255 34 2290 22625563 6290 127 456 0 0 0
-  cpu0 1132 34 1441 11311718 3675 127 438 0 0 0
-  cpu1 1123 0 849 11313845 2614 0 18 0 0 0
-  intr 114930548 113199788 3 0 5 263 0 4 [... lots more numbers ...]
-  ctxt 1990473
-  btime 1062191376
-  processes 2915
-  procs_running 1
-  procs_blocked 0
-  softirq 183433 0 21755 12 39 1137 231 21459 2263
-
-The very first  "cpu" line aggregates the  numbers in all  of the other "cpuN"
-lines.  These numbers identify the amount of time the CPU has spent performing
-different kinds of work.  Time units are in USER_HZ (typically hundredths of a
-second).  The meanings of the columns are as follows, from left to right:
-
-- user: normal processes executing in user mode
-- nice: niced processes executing in user mode
-- system: processes executing in kernel mode
-- idle: twiddling thumbs
-- iowait: In a word, iowait stands for waiting for I/O to complete. But there
-  are several problems:
-  1. Cpu will not wait for I/O to complete, iowait is the time that a task is
-     waiting for I/O to complete. When cpu goes into idle state for
-     outstanding task io, another task will be scheduled on this CPU.
-  2. In a multi-core CPU, the task waiting for I/O to complete is not running
-     on any CPU, so the iowait of each CPU is difficult to calculate.
-  3. The value of iowait field in /proc/stat will decrease in certain
-     conditions.
-  So, the iowait is not reliable by reading from /proc/stat.
-- irq: servicing interrupts
-- softirq: servicing softirqs
-- steal: involuntary wait
-- guest: running a normal guest
-- guest_nice: running a niced guest
-
-The "intr" line gives counts of interrupts  serviced since boot time, for each
-of the  possible system interrupts.   The first  column  is the  total of  all
-interrupts serviced  including  unnumbered  architecture specific  interrupts;
-each  subsequent column is the  total for that particular numbered interrupt.
-Unnumbered interrupts are not shown, only summed into the total.
-
-The "ctxt" line gives the total number of context switches across all CPUs.
-
-The "btime" line gives  the time at which the  system booted, in seconds since
-the Unix epoch.
-
-The "processes" line gives the number  of processes and threads created, which
-includes (but  is not limited  to) those  created by  calls to the  fork() and
-clone() system calls.
-
-The "procs_running" line gives the total number of threads that are
-running or ready to run (i.e., the total number of runnable threads).
-
-The   "procs_blocked" line gives  the  number of  processes currently blocked,
-waiting for I/O to complete.
-
-The "softirq" line gives counts of softirqs serviced since boot time, for each
-of the possible system softirqs. The first column is the total of all
-softirqs serviced; each subsequent column is the total for that particular
-softirq.
-
-
-1.9 Ext4 file system parameters
--------------------------------
-
-Information about mounted ext4 file systems can be found in
-/proc/fs/ext4.  Each mounted filesystem will have a directory in
-/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
-/proc/fs/ext4/dm-0).   The files in each per-device directory are shown
-in Table 1-12, below.
-
-Table 1-12: Files in /proc/fs/ext4/<devname>
-..............................................................................
- File            Content                                        
- mb_groups       details of multiblock allocator buddy cache of free blocks
-..............................................................................
-
-2.0 /proc/consoles
-------------------
-Shows registered system console lines.
-
-To see which character device lines are currently used for the system console
-/dev/console, you may simply look into the file /proc/consoles:
-
-  > cat /proc/consoles
-  tty0                 -WU (ECp)       4:7
-  ttyS0                -W- (Ep)        4:64
-
-The columns are:
-
-  device               name of the device
-  operations           R = can do read operations
-                       W = can do write operations
-                       U = can do unblank
-  flags                E = it is enabled
-                       C = it is preferred console
-                       B = it is primary boot console
-                       p = it is used for printk buffer
-                       b = it is not a TTY but a Braille device
-                       a = it is safe to use when cpu is offline
-  major:minor          major and minor number of the device separated by a colon
-
-------------------------------------------------------------------------------
-Summary
-------------------------------------------------------------------------------
-The /proc file system serves information about the running system. It not only
-allows access to process data but also allows you to request the kernel status
-by reading files in the hierarchy.
-
-The directory  structure  of /proc reflects the types of information and makes
-it easy, if not obvious, where to look for specific data.
-------------------------------------------------------------------------------
-
-------------------------------------------------------------------------------
-CHAPTER 2: MODIFYING SYSTEM PARAMETERS
-------------------------------------------------------------------------------
-
-------------------------------------------------------------------------------
-In This Chapter
-------------------------------------------------------------------------------
-* Modifying kernel parameters by writing into files found in /proc/sys
-* Exploring the files which modify certain parameters
-* Review of the /proc/sys file tree
-------------------------------------------------------------------------------
-
-
-A very  interesting part of /proc is the directory /proc/sys. This is not only
-a source  of  information,  it also allows you to change parameters within the
-kernel. Be  very  careful  when attempting this. You can optimize your system,
-but you  can  also  cause  it  to  crash.  Never  alter kernel parameters on a
-production system.  Set  up  a  development machine and test to make sure that
-everything works  the  way  you want it to. You may have no alternative but to
-reboot the machine once an error has been made.
-
-To change  a  value,  simply  echo  the new value into the file. An example is
-given below  in the section on the file system data. You need to be root to do
-this. You  can  create  your  own  boot script to perform this every time your
-system boots.
-
-The files  in /proc/sys can be used to fine tune and monitor miscellaneous and
-general things  in  the operation of the Linux kernel. Since some of the files
-can inadvertently  disrupt  your  system,  it  is  advisable  to  read  both
-documentation and  source  before actually making adjustments. In any case, be
-very careful  when  writing  to  any  of these files. The entries in /proc may
-change slightly between the 2.1.* and the 2.2 kernel, so if there is any doubt
-review the kernel documentation in the directory /usr/src/linux/Documentation.
-This chapter  is  heavily  based  on the documentation included in the pre 2.2
-kernels, and became part of it in version 2.2.1 of the Linux kernel.
-
-Please see: Documentation/admin-guide/sysctl/ directory for descriptions of these
-entries.
-
-------------------------------------------------------------------------------
-Summary
-------------------------------------------------------------------------------
-Certain aspects  of  kernel  behavior  can be modified at runtime, without the
-need to  recompile  the kernel, or even to reboot the system. The files in the
-/proc/sys tree  can  not only be read, but also modified. You can use the echo
-command to write value into these files, thereby changing the default settings
-of the kernel.
-------------------------------------------------------------------------------
-
-------------------------------------------------------------------------------
-CHAPTER 3: PER-PROCESS PARAMETERS
-------------------------------------------------------------------------------
-
-3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj- Adjust the oom-killer score
---------------------------------------------------------------------------------
-
-These file can be used to adjust the badness heuristic used to select which
-process gets killed in out of memory conditions.
-
-The badness heuristic assigns a value to each candidate task ranging from 0
-(never kill) to 1000 (always kill) to determine which process is targeted.  The
-units are roughly a proportion along that range of allowed memory the process
-may allocate from based on an estimation of its current memory and swap use.
-For example, if a task is using all allowed memory, its badness score will be
-1000.  If it is using half of its allowed memory, its score will be 500.
-
-There is an additional factor included in the badness score: the current memory
-and swap usage is discounted by 3% for root processes.
-
-The amount of "allowed" memory depends on the context in which the oom killer
-was called.  If it is due to the memory assigned to the allocating task's cpuset
-being exhausted, the allowed memory represents the set of mems assigned to that
-cpuset.  If it is due to a mempolicy's node(s) being exhausted, the allowed
-memory represents the set of mempolicy nodes.  If it is due to a memory
-limit (or swap limit) being reached, the allowed memory is that configured
-limit.  Finally, if it is due to the entire system being out of memory, the
-allowed memory represents all allocatable resources.
-
-The value of /proc/<pid>/oom_score_adj is added to the badness score before it
-is used to determine which task to kill.  Acceptable values range from -1000
-(OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX).  This allows userspace to
-polarize the preference for oom killing either by always preferring a certain
-task or completely disabling it.  The lowest possible value, -1000, is
-equivalent to disabling oom killing entirely for that task since it will always
-report a badness score of 0.
-
-Consequently, it is very simple for userspace to define the amount of memory to
-consider for each task.  Setting a /proc/<pid>/oom_score_adj value of +500, for
-example, is roughly equivalent to allowing the remainder of tasks sharing the
-same system, cpuset, mempolicy, or memory controller resources to use at least
-50% more memory.  A value of -500, on the other hand, would be roughly
-equivalent to discounting 50% of the task's allowed memory from being considered
-as scoring against the task.
-
-For backwards compatibility with previous kernels, /proc/<pid>/oom_adj may also
-be used to tune the badness score.  Its acceptable values range from -16
-(OOM_ADJUST_MIN) to +15 (OOM_ADJUST_MAX) and a special value of -17
-(OOM_DISABLE) to disable oom killing entirely for that task.  Its value is
-scaled linearly with /proc/<pid>/oom_score_adj.
-
-The value of /proc/<pid>/oom_score_adj may be reduced no lower than the last
-value set by a CAP_SYS_RESOURCE process. To reduce the value any lower
-requires CAP_SYS_RESOURCE.
-
-Caveat: when a parent task is selected, the oom killer will sacrifice any first
-generation children with separate address spaces instead, if possible.  This
-avoids servers and important system daemons from being killed and loses the
-minimal amount of work.
-
-
-3.2 /proc/<pid>/oom_score - Display current oom-killer score
--------------------------------------------------------------
-
-This file can be used to check the current score used by the oom-killer is for
-any given <pid>. Use it together with /proc/<pid>/oom_score_adj to tune which
-process should be killed in an out-of-memory situation.
-
-
-3.3  /proc/<pid>/io - Display the IO accounting fields
--------------------------------------------------------
-
-This file contains IO statistics for each running process
-
-Example
--------
-
-test:/tmp # dd if=/dev/zero of=/tmp/test.dat &
-[1] 3828
-
-test:/tmp # cat /proc/3828/io
-rchar: 323934931
-wchar: 323929600
-syscr: 632687
-syscw: 632675
-read_bytes: 0
-write_bytes: 323932160
-cancelled_write_bytes: 0
-
-
-Description
------------
-
-rchar
------
-
-I/O counter: chars read
-The number of bytes which this task has caused to be read from storage. This
-is simply the sum of bytes which this process passed to read() and pread().
-It includes things like tty IO and it is unaffected by whether or not actual
-physical disk IO was required (the read might have been satisfied from
-pagecache)
-
-
-wchar
------
-
-I/O counter: chars written
-The number of bytes which this task has caused, or shall cause to be written
-to disk. Similar caveats apply here as with rchar.
-
-
-syscr
------
-
-I/O counter: read syscalls
-Attempt to count the number of read I/O operations, i.e. syscalls like read()
-and pread().
-
-
-syscw
------
-
-I/O counter: write syscalls
-Attempt to count the number of write I/O operations, i.e. syscalls like
-write() and pwrite().
-
-
-read_bytes
-----------
-
-I/O counter: bytes read
-Attempt to count the number of bytes which this process really did cause to
-be fetched from the storage layer. Done at the submit_bio() level, so it is
-accurate for block-backed filesystems. <please add status regarding NFS and
-CIFS at a later time>
-
-
-write_bytes
------------
-
-I/O counter: bytes written
-Attempt to count the number of bytes which this process caused to be sent to
-the storage layer. This is done at page-dirtying time.
-
-
-cancelled_write_bytes
----------------------
-
-The big inaccuracy here is truncate. If a process writes 1MB to a file and
-then deletes the file, it will in fact perform no writeout. But it will have
-been accounted as having caused 1MB of write.
-In other words: The number of bytes which this process caused to not happen,
-by truncating pagecache. A task can cause "negative" IO too. If this task
-truncates some dirty pagecache, some IO which another task has been accounted
-for (in its write_bytes) will not be happening. We _could_ just subtract that
-from the truncating task's write_bytes, but there is information loss in doing
-that.
-
-
-Note
-----
-
-At its current implementation state, this is a bit racy on 32-bit machines: if
-process A reads process B's /proc/pid/io while process B is updating one of
-those 64-bit counters, process A could see an intermediate result.
-
-
-More information about this can be found within the taskstats documentation in
-Documentation/accounting.
-
-3.4 /proc/<pid>/coredump_filter - Core dump filtering settings
----------------------------------------------------------------
-When a process is dumped, all anonymous memory is written to a core file as
-long as the size of the core file isn't limited. But sometimes we don't want
-to dump some memory segments, for example, huge shared memory or DAX.
-Conversely, sometimes we want to save file-backed memory segments into a core
-file, not only the individual files.
-
-/proc/<pid>/coredump_filter allows you to customize which memory segments
-will be dumped when the <pid> process is dumped. coredump_filter is a bitmask
-of memory types. If a bit of the bitmask is set, memory segments of the
-corresponding memory type are dumped, otherwise they are not dumped.
-
-The following 9 memory types are supported:
-  - (bit 0) anonymous private memory
-  - (bit 1) anonymous shared memory
-  - (bit 2) file-backed private memory
-  - (bit 3) file-backed shared memory
-  - (bit 4) ELF header pages in file-backed private memory areas (it is
-            effective only if the bit 2 is cleared)
-  - (bit 5) hugetlb private memory
-  - (bit 6) hugetlb shared memory
-  - (bit 7) DAX private memory
-  - (bit 8) DAX shared memory
-
-  Note that MMIO pages such as frame buffer are never dumped and vDSO pages
-  are always dumped regardless of the bitmask status.
-
-  Note that bits 0-4 don't affect hugetlb or DAX memory. hugetlb memory is
-  only affected by bit 5-6, and DAX is only affected by bits 7-8.
-
-The default value of coredump_filter is 0x33; this means all anonymous memory
-segments, ELF header pages and hugetlb private memory are dumped.
-
-If you don't want to dump all shared memory segments attached to pid 1234,
-write 0x31 to the process's proc file.
-
-  $ echo 0x31 > /proc/1234/coredump_filter
-
-When a new process is created, the process inherits the bitmask status from its
-parent. It is useful to set up coredump_filter before the program runs.
-For example:
-
-  $ echo 0x7 > /proc/self/coredump_filter
-  $ ./some_program
-
-3.5	/proc/<pid>/mountinfo - Information about mounts
---------------------------------------------------------
-
-This file contains lines of the form:
-
-36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
-(1)(2)(3)   (4)   (5)      (6)      (7)   (8) (9)   (10)         (11)
-
-(1) mount ID:  unique identifier of the mount (may be reused after umount)
-(2) parent ID:  ID of parent (or of self for the top of the mount tree)
-(3) major:minor:  value of st_dev for files on filesystem
-(4) root:  root of the mount within the filesystem
-(5) mount point:  mount point relative to the process's root
-(6) mount options:  per mount options
-(7) optional fields:  zero or more fields of the form "tag[:value]"
-(8) separator:  marks the end of the optional fields
-(9) filesystem type:  name of filesystem of the form "type[.subtype]"
-(10) mount source:  filesystem specific information or "none"
-(11) super options:  per super block options
-
-Parsers should ignore all unrecognised optional fields.  Currently the
-possible optional fields are:
-
-shared:X  mount is shared in peer group X
-master:X  mount is slave to peer group X
-propagate_from:X  mount is slave and receives propagation from peer group X (*)
-unbindable  mount is unbindable
-
-(*) X is the closest dominant peer group under the process's root.  If
-X is the immediate master of the mount, or if there's no dominant peer
-group under the same root, then only the "master:X" field is present
-and not the "propagate_from:X" field.
-
-For more information on mount propagation see:
-
-  Documentation/filesystems/sharedsubtree.txt
-
-
-3.6	/proc/<pid>/comm  & /proc/<pid>/task/<tid>/comm
---------------------------------------------------------
-These files provide a method to access a tasks comm value. It also allows for
-a task to set its own or one of its thread siblings comm value. The comm value
-is limited in size compared to the cmdline value, so writing anything longer
-then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated
-comm value.
-
-
-3.7	/proc/<pid>/task/<tid>/children - Information about task children
--------------------------------------------------------------------------
-This file provides a fast way to retrieve first level children pids
-of a task pointed by <pid>/<tid> pair. The format is a space separated
-stream of pids.
-
-Note the "first level" here -- if a child has own children they will
-not be listed here, one needs to read /proc/<children-pid>/task/<tid>/children
-to obtain the descendants.
-
-Since this interface is intended to be fast and cheap it doesn't
-guarantee to provide precise results and some children might be
-skipped, especially if they've exited right after we printed their
-pids, so one need to either stop or freeze processes being inspected
-if precise results are needed.
-
-
-3.8	/proc/<pid>/fdinfo/<fd> - Information about opened file
----------------------------------------------------------------
-This file provides information associated with an opened file. The regular
-files have at least three fields -- 'pos', 'flags' and mnt_id. The 'pos'
-represents the current offset of the opened file in decimal form [see lseek(2)
-for details], 'flags' denotes the octal O_xxx mask the file has been
-created with [see open(2) for details] and 'mnt_id' represents mount ID of
-the file system containing the opened file [see 3.5 /proc/<pid>/mountinfo
-for details].
-
-A typical output is
-
-	pos:	0
-	flags:	0100002
-	mnt_id:	19
-
-All locks associated with a file descriptor are shown in its fdinfo too.
-
-lock:       1: FLOCK  ADVISORY  WRITE 359 00:13:11691 0 EOF
-
-The files such as eventfd, fsnotify, signalfd, epoll among the regular pos/flags
-pair provide additional information particular to the objects they represent.
-
-	Eventfd files
-	~~~~~~~~~~~~~
-	pos:	0
-	flags:	04002
-	mnt_id:	9
-	eventfd-count:	5a
-
-	where 'eventfd-count' is hex value of a counter.
-
-	Signalfd files
-	~~~~~~~~~~~~~~
-	pos:	0
-	flags:	04002
-	mnt_id:	9
-	sigmask:	0000000000000200
-
-	where 'sigmask' is hex value of the signal mask associated
-	with a file.
-
-	Epoll files
-	~~~~~~~~~~~
-	pos:	0
-	flags:	02
-	mnt_id:	9
-	tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af sdev:7
-
-	where 'tfd' is a target file descriptor number in decimal form,
-	'events' is events mask being watched and the 'data' is data
-	associated with a target [see epoll(7) for more details].
-
-	The 'pos' is current offset of the target file in decimal form
-	[see lseek(2)], 'ino' and 'sdev' are inode and device numbers
-	where target file resides, all in hex format.
-
-	Fsnotify files
-	~~~~~~~~~~~~~~
-	For inotify files the format is the following
-
-	pos:	0
-	flags:	02000000
-	inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
-
-	where 'wd' is a watch descriptor in decimal form, ie a target file
-	descriptor number, 'ino' and 'sdev' are inode and device where the
-	target file resides and the 'mask' is the mask of events, all in hex
-	form [see inotify(7) for more details].
-
-	If the kernel was built with exportfs support, the path to the target
-	file is encoded as a file handle.  The file handle is provided by three
-	fields 'fhandle-bytes', 'fhandle-type' and 'f_handle', all in hex
-	format.
-
-	If the kernel is built without exportfs support the file handle won't be
-	printed out.
-
-	If there is no inotify mark attached yet the 'inotify' line will be omitted.
-
-	For fanotify files the format is
-
-	pos:	0
-	flags:	02
-	mnt_id:	9
-	fanotify flags:10 event-flags:0
-	fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
-	fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
-
-	where fanotify 'flags' and 'event-flags' are values used in fanotify_init
-	call, 'mnt_id' is the mount point identifier, 'mflags' is the value of
-	flags associated with mark which are tracked separately from events
-	mask. 'ino', 'sdev' are target inode and device, 'mask' is the events
-	mask and 'ignored_mask' is the mask of events which are to be ignored.
-	All in hex format. Incorporation of 'mflags', 'mask' and 'ignored_mask'
-	does provide information about flags and mask used in fanotify_mark
-	call [see fsnotify manpage for details].
-
-	While the first three lines are mandatory and always printed, the rest is
-	optional and may be omitted if no marks created yet.
-
-	Timerfd files
-	~~~~~~~~~~~~~
-
-	pos:	0
-	flags:	02
-	mnt_id:	9
-	clockid: 0
-	ticks: 0
-	settime flags: 01
-	it_value: (0, 49406829)
-	it_interval: (1, 0)
-
-	where 'clockid' is the clock type and 'ticks' is the number of the timer expirations
-	that have occurred [see timerfd_create(2) for details]. 'settime flags' are
-	flags in octal form been used to setup the timer [see timerfd_settime(2) for
-	details]. 'it_value' is remaining time until the timer exiration.
-	'it_interval' is the interval for the timer. Note the timer might be set up
-	with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value'
-	still exhibits timer's remaining time.
-
-3.9	/proc/<pid>/map_files - Information about memory mapped files
----------------------------------------------------------------------
-This directory contains symbolic links which represent memory mapped files
-the process is maintaining.  Example output:
-
-     | lr-------- 1 root root 64 Jan 27 11:24 333c600000-333c620000 -> /usr/lib64/ld-2.18.so
-     | lr-------- 1 root root 64 Jan 27 11:24 333c81f000-333c820000 -> /usr/lib64/ld-2.18.so
-     | lr-------- 1 root root 64 Jan 27 11:24 333c820000-333c821000 -> /usr/lib64/ld-2.18.so
-     | ...
-     | lr-------- 1 root root 64 Jan 27 11:24 35d0421000-35d0422000 -> /usr/lib64/libselinux.so.1
-     | lr-------- 1 root root 64 Jan 27 11:24 400000-41a000 -> /usr/bin/ls
-
-The name of a link represents the virtual memory bounds of a mapping, i.e.
-vm_area_struct::vm_start-vm_area_struct::vm_end.
-
-The main purpose of the map_files is to retrieve a set of memory mapped
-files in a fast way instead of parsing /proc/<pid>/maps or
-/proc/<pid>/smaps, both of which contain many more records.  At the same
-time one can open(2) mappings from the listings of two processes and
-comparing their inode numbers to figure out which anonymous memory areas
-are actually shared.
-
-3.10	/proc/<pid>/timerslack_ns - Task timerslack value
----------------------------------------------------------
-This file provides the value of the task's timerslack value in nanoseconds.
-This value specifies a amount of time that normal timers may be deferred
-in order to coalesce timers and avoid unnecessary wakeups.
-
-This allows a task's interactivity vs power consumption trade off to be
-adjusted.
-
-Writing 0 to the file will set the tasks timerslack to the default value.
-
-Valid values are from 0 - ULLONG_MAX
-
-An application setting the value must have PTRACE_MODE_ATTACH_FSCREDS level
-permissions on the task specified to change its timerslack_ns value.
-
-3.11	/proc/<pid>/patch_state - Livepatch patch operation state
------------------------------------------------------------------
-When CONFIG_LIVEPATCH is enabled, this file displays the value of the
-patch state for the task.
-
-A value of '-1' indicates that no patch is in transition.
-
-A value of '0' indicates that a patch is in transition and the task is
-unpatched.  If the patch is being enabled, then the task hasn't been
-patched yet.  If the patch is being disabled, then the task has already
-been unpatched.
-
-A value of '1' indicates that a patch is in transition and the task is
-patched.  If the patch is being enabled, then the task has already been
-patched.  If the patch is being disabled, then the task hasn't been
-unpatched yet.
-
-3.12 /proc/<pid>/arch_status - task architecture specific status
--------------------------------------------------------------------
-When CONFIG_PROC_PID_ARCH_STATUS is enabled, this file displays the
-architecture specific status of the task.
-
-Example
--------
- $ cat /proc/6753/arch_status
- AVX512_elapsed_ms:      8
-
-Description
------------
-
-x86 specific entries:
----------------------
- AVX512_elapsed_ms:
- ------------------
-  If AVX512 is supported on the machine, this entry shows the milliseconds
-  elapsed since the last time AVX512 usage was recorded. The recording
-  happens on a best effort basis when a task is scheduled out. This means
-  that the value depends on two factors:
-
-    1) The time which the task spent on the CPU without being scheduled
-       out. With CPU isolation and a single runnable task this can take
-       several seconds.
-
-    2) The time since the task was scheduled out last. Depending on the
-       reason for being scheduled out (time slice exhausted, syscall ...)
-       this can be arbitrary long time.
-
-  As a consequence the value cannot be considered precise and authoritative
-  information. The application which uses this information has to be aware
-  of the overall scenario on the system in order to determine whether a
-  task is a real AVX512 user or not. Precise information can be obtained
-  with performance counters.
-
-  A special value of '-1' indicates that no AVX512 usage was recorded, thus
-  the task is unlikely an AVX512 user, but depends on the workload and the
-  scheduling scenario, it also could be a false negative mentioned above.
-
-------------------------------------------------------------------------------
-Configuring procfs
-------------------------------------------------------------------------------
-
-4.1	Mount options
----------------------
-
-The following mount options are supported:
-
-	hidepid=	Set /proc/<pid>/ access mode.
-	gid=		Set the group authorized to learn processes information.
-
-hidepid=0 means classic mode - everybody may access all /proc/<pid>/ directories
-(default).
-
-hidepid=1 means users may not access any /proc/<pid>/ directories but their
-own.  Sensitive files like cmdline, sched*, status are now protected against
-other users.  This makes it impossible to learn whether any user runs
-specific program (given the program doesn't reveal itself by its behaviour).
-As an additional bonus, as /proc/<pid>/cmdline is unaccessible for other users,
-poorly written programs passing sensitive information via program arguments are
-now protected against local eavesdroppers.
-
-hidepid=2 means hidepid=1 plus all /proc/<pid>/ will be fully invisible to other
-users.  It doesn't mean that it hides a fact whether a process with a specific
-pid value exists (it can be learned by other means, e.g. by "kill -0 $PID"),
-but it hides process' uid and gid, which may be learned by stat()'ing
-/proc/<pid>/ otherwise.  It greatly complicates an intruder's task of gathering
-information about running processes, whether some daemon runs with elevated
-privileges, whether other user runs some sensitive program, whether other users
-run any program at all, etc.
-
-gid= defines a group authorized to learn processes information otherwise
-prohibited by hidepid=.  If you use some daemon like identd which needs to learn
-information about processes information, just add identd to this group.
-- 
cgit 


From d5eefa2c5e567751df74d38d5b8cec7ed6e7a08c Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:19 +0100
Subject: docs: filesystems: convert qnx6.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/ccd22c1e1426ce4cb30ece9a71c39ebb41844762.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/qnx6.rst  | 196 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/qnx6.txt  | 174 --------------------------------
 3 files changed, 197 insertions(+), 174 deletions(-)
 create mode 100644 Documentation/filesystems/qnx6.rst
 delete mode 100644 Documentation/filesystems/qnx6.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 671906e2fee6..08883a481a76 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -82,5 +82,6 @@ Documentation for filesystem implementations.
    orangefs
    overlayfs
    proc
+   qnx6
    virtiofs
    vfat
diff --git a/Documentation/filesystems/qnx6.rst b/Documentation/filesystems/qnx6.rst
new file mode 100644
index 000000000000..b71308314070
--- /dev/null
+++ b/Documentation/filesystems/qnx6.rst
@@ -0,0 +1,196 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===================
+The QNX6 Filesystem
+===================
+
+The qnx6fs is used by newer QNX operating system versions. (e.g. Neutrino)
+It got introduced in QNX 6.4.0 and is used default since 6.4.1.
+
+Option
+======
+
+mmi_fs		Mount filesystem as used for example by Audi MMI 3G system
+
+Specification
+=============
+
+qnx6fs shares many properties with traditional Unix filesystems. It has the
+concepts of blocks, inodes and directories.
+
+On QNX it is possible to create little endian and big endian qnx6 filesystems.
+This feature makes it possible to create and use a different endianness fs
+for the target (QNX is used on quite a range of embedded systems) platform
+running on a different endianness.
+
+The Linux driver handles endianness transparently. (LE and BE)
+
+Blocks
+------
+
+The space in the device or file is split up into blocks. These are a fixed
+size of 512, 1024, 2048 or 4096, which is decided when the filesystem is
+created.
+
+Blockpointers are 32bit, so the maximum space that can be addressed is
+2^32 * 4096 bytes or 16TB
+
+The superblocks
+---------------
+
+The superblock contains all global information about the filesystem.
+Each qnx6fs got two superblocks, each one having a 64bit serial number.
+That serial number is used to identify the "active" superblock.
+In write mode with reach new snapshot (after each synchronous write), the
+serial of the new master superblock is increased (old superblock serial + 1)
+
+So basically the snapshot functionality is realized by an atomic final
+update of the serial number. Before updating that serial, all modifications
+are done by copying all modified blocks during that specific write request
+(or period) and building up a new (stable) filesystem structure under the
+inactive superblock.
+
+Each superblock holds a set of root inodes for the different filesystem
+parts. (Inode, Bitmap and Longfilenames)
+Each of these root nodes holds information like total size of the stored
+data and the addressing levels in that specific tree.
+If the level value is 0, up to 16 direct blocks can be addressed by each
+node.
+
+Level 1 adds an additional indirect addressing level where each indirect
+addressing block holds up to blocksize / 4 bytes pointers to data blocks.
+Level 2 adds an additional indirect addressing block level (so, already up
+to 16 * 256 * 256 = 1048576 blocks that can be addressed by such a tree).
+
+Unused block pointers are always set to ~0 - regardless of root node,
+indirect addressing blocks or inodes.
+
+Data leaves are always on the lowest level. So no data is stored on upper
+tree levels.
+
+The first Superblock is located at 0x2000. (0x2000 is the bootblock size)
+The Audi MMI 3G first superblock directly starts at byte 0.
+
+Second superblock position can either be calculated from the superblock
+information (total number of filesystem blocks) or by taking the highest
+device address, zeroing the last 3 bytes and then subtracting 0x1000 from
+that address.
+
+0x1000 is the size reserved for each superblock - regardless of the
+blocksize of the filesystem.
+
+Inodes
+------
+
+Each object in the filesystem is represented by an inode. (index node)
+The inode structure contains pointers to the filesystem blocks which contain
+the data held in the object and all of the metadata about an object except
+its longname. (filenames longer than 27 characters)
+The metadata about an object includes the permissions, owner, group, flags,
+size, number of blocks used, access time, change time and modification time.
+
+Object mode field is POSIX format. (which makes things easier)
+
+There are also pointers to the first 16 blocks, if the object data can be
+addressed with 16 direct blocks.
+
+For more than 16 blocks an indirect addressing in form of another tree is
+used. (scheme is the same as the one used for the superblock root nodes)
+
+The filesize is stored 64bit. Inode counting starts with 1. (while long
+filename inodes start with 0)
+
+Directories
+-----------
+
+A directory is a filesystem object and has an inode just like a file.
+It is a specially formatted file containing records which associate each
+name with an inode number.
+
+'.' inode number points to the directory inode
+
+'..' inode number points to the parent directory inode
+
+Eeach filename record additionally got a filename length field.
+
+One special case are long filenames or subdirectory names.
+
+These got set a filename length field of 0xff in the corresponding directory
+record plus the longfile inode number also stored in that record.
+
+With that longfilename inode number, the longfilename tree can be walked
+starting with the superblock longfilename root node pointers.
+
+Special files
+-------------
+
+Symbolic links are also filesystem objects with inodes. They got a specific
+bit in the inode mode field identifying them as symbolic link.
+
+The directory entry file inode pointer points to the target file inode.
+
+Hard links got an inode, a directory entry, but a specific mode bit set,
+no block pointers and the directory file record pointing to the target file
+inode.
+
+Character and block special devices do not exist in QNX as those files
+are handled by the QNX kernel/drivers and created in /dev independent of the
+underlaying filesystem.
+
+Long filenames
+--------------
+
+Long filenames are stored in a separate addressing tree. The staring point
+is the longfilename root node in the active superblock.
+
+Each data block (tree leaves) holds one long filename. That filename is
+limited to 510 bytes. The first two starting bytes are used as length field
+for the actual filename.
+
+If that structure shall fit for all allowed blocksizes, it is clear why there
+is a limit of 510 bytes for the actual filename stored.
+
+Bitmap
+------
+
+The qnx6fs filesystem allocation bitmap is stored in a tree under bitmap
+root node in the superblock and each bit in the bitmap represents one
+filesystem block.
+
+The first block is block 0, which starts 0x1000 after superblock start.
+So for a normal qnx6fs 0x3000 (bootblock + superblock) is the physical
+address at which block 0 is located.
+
+Bits at the end of the last bitmap block are set to 1, if the device is
+smaller than addressing space in the bitmap.
+
+Bitmap system area
+------------------
+
+The bitmap itself is divided into three parts.
+
+First the system area, that is split into two halves.
+
+Then userspace.
+
+The requirement for a static, fixed preallocated system area comes from how
+qnx6fs deals with writes.
+
+Each superblock got it's own half of the system area. So superblock #1
+always uses blocks from the lower half while superblock #2 just writes to
+blocks represented by the upper half bitmap system area bits.
+
+Bitmap blocks, Inode blocks and indirect addressing blocks for those two
+tree structures are treated as system blocks.
+
+The rational behind that is that a write request can work on a new snapshot
+(system area of the inactive - resp. lower serial numbered superblock) while
+at the same time there is still a complete stable filesystem structer in the
+other half of the system area.
+
+When finished with writing (a sync write is completed, the maximum sync leap
+time or a filesystem sync is requested), serial of the previously inactive
+superblock atomically is increased and the fs switches over to that - then
+stable declared - superblock.
+
+For all data outside the system area, blocks are just copied while writing.
diff --git a/Documentation/filesystems/qnx6.txt b/Documentation/filesystems/qnx6.txt
deleted file mode 100644
index 48ea68f15845..000000000000
--- a/Documentation/filesystems/qnx6.txt
+++ /dev/null
@@ -1,174 +0,0 @@
-The QNX6 Filesystem
-===================
-
-The qnx6fs is used by newer QNX operating system versions. (e.g. Neutrino)
-It got introduced in QNX 6.4.0 and is used default since 6.4.1.
-
-Option
-======
-
-mmi_fs		Mount filesystem as used for example by Audi MMI 3G system
-
-Specification
-=============
-
-qnx6fs shares many properties with traditional Unix filesystems. It has the
-concepts of blocks, inodes and directories.
-On QNX it is possible to create little endian and big endian qnx6 filesystems.
-This feature makes it possible to create and use a different endianness fs
-for the target (QNX is used on quite a range of embedded systems) platform
-running on a different endianness.
-The Linux driver handles endianness transparently. (LE and BE)
-
-Blocks
-------
-
-The space in the device or file is split up into blocks. These are a fixed
-size of 512, 1024, 2048 or 4096, which is decided when the filesystem is
-created.
-Blockpointers are 32bit, so the maximum space that can be addressed is
-2^32 * 4096 bytes or 16TB
-
-The superblocks
----------------
-
-The superblock contains all global information about the filesystem.
-Each qnx6fs got two superblocks, each one having a 64bit serial number.
-That serial number is used to identify the "active" superblock.
-In write mode with reach new snapshot (after each synchronous write), the
-serial of the new master superblock is increased (old superblock serial + 1)
-
-So basically the snapshot functionality is realized by an atomic final
-update of the serial number. Before updating that serial, all modifications
-are done by copying all modified blocks during that specific write request
-(or period) and building up a new (stable) filesystem structure under the
-inactive superblock.
-
-Each superblock holds a set of root inodes for the different filesystem
-parts. (Inode, Bitmap and Longfilenames)
-Each of these root nodes holds information like total size of the stored
-data and the addressing levels in that specific tree.
-If the level value is 0, up to 16 direct blocks can be addressed by each
-node.
-Level 1 adds an additional indirect addressing level where each indirect
-addressing block holds up to blocksize / 4 bytes pointers to data blocks.
-Level 2 adds an additional indirect addressing block level (so, already up
-to 16 * 256 * 256 = 1048576 blocks that can be addressed by such a tree).
-
-Unused block pointers are always set to ~0 - regardless of root node,
-indirect addressing blocks or inodes.
-Data leaves are always on the lowest level. So no data is stored on upper
-tree levels.
-
-The first Superblock is located at 0x2000. (0x2000 is the bootblock size)
-The Audi MMI 3G first superblock directly starts at byte 0.
-Second superblock position can either be calculated from the superblock
-information (total number of filesystem blocks) or by taking the highest
-device address, zeroing the last 3 bytes and then subtracting 0x1000 from
-that address.
-
-0x1000 is the size reserved for each superblock - regardless of the
-blocksize of the filesystem.
-
-Inodes
-------
-
-Each object in the filesystem is represented by an inode. (index node)
-The inode structure contains pointers to the filesystem blocks which contain
-the data held in the object and all of the metadata about an object except
-its longname. (filenames longer than 27 characters)
-The metadata about an object includes the permissions, owner, group, flags,
-size, number of blocks used, access time, change time and modification time.
-
-Object mode field is POSIX format. (which makes things easier)
-
-There are also pointers to the first 16 blocks, if the object data can be
-addressed with 16 direct blocks.
-For more than 16 blocks an indirect addressing in form of another tree is
-used. (scheme is the same as the one used for the superblock root nodes)
-
-The filesize is stored 64bit. Inode counting starts with 1. (while long
-filename inodes start with 0)
-
-Directories
------------
-
-A directory is a filesystem object and has an inode just like a file.
-It is a specially formatted file containing records which associate each
-name with an inode number.
-'.' inode number points to the directory inode
-'..' inode number points to the parent directory inode
-Eeach filename record additionally got a filename length field.
-
-One special case are long filenames or subdirectory names.
-These got set a filename length field of 0xff in the corresponding directory
-record plus the longfile inode number also stored in that record.
-With that longfilename inode number, the longfilename tree can be walked
-starting with the superblock longfilename root node pointers.
-
-Special files
--------------
-
-Symbolic links are also filesystem objects with inodes. They got a specific
-bit in the inode mode field identifying them as symbolic link.
-The directory entry file inode pointer points to the target file inode.
-
-Hard links got an inode, a directory entry, but a specific mode bit set,
-no block pointers and the directory file record pointing to the target file
-inode.
-
-Character and block special devices do not exist in QNX as those files
-are handled by the QNX kernel/drivers and created in /dev independent of the
-underlaying filesystem.
-
-Long filenames
---------------
-
-Long filenames are stored in a separate addressing tree. The staring point
-is the longfilename root node in the active superblock.
-Each data block (tree leaves) holds one long filename. That filename is
-limited to 510 bytes. The first two starting bytes are used as length field
-for the actual filename.
-If that structure shall fit for all allowed blocksizes, it is clear why there
-is a limit of 510 bytes for the actual filename stored.
-
-Bitmap
-------
-
-The qnx6fs filesystem allocation bitmap is stored in a tree under bitmap
-root node in the superblock and each bit in the bitmap represents one
-filesystem block.
-The first block is block 0, which starts 0x1000 after superblock start.
-So for a normal qnx6fs 0x3000 (bootblock + superblock) is the physical
-address at which block 0 is located.
-
-Bits at the end of the last bitmap block are set to 1, if the device is
-smaller than addressing space in the bitmap.
-
-Bitmap system area
-------------------
-
-The bitmap itself is divided into three parts.
-First the system area, that is split into two halves.
-Then userspace.
-
-The requirement for a static, fixed preallocated system area comes from how
-qnx6fs deals with writes.
-Each superblock got it's own half of the system area. So superblock #1
-always uses blocks from the lower half while superblock #2 just writes to
-blocks represented by the upper half bitmap system area bits.
-
-Bitmap blocks, Inode blocks and indirect addressing blocks for those two
-tree structures are treated as system blocks.
-
-The rational behind that is that a write request can work on a new snapshot
-(system area of the inactive - resp. lower serial numbered superblock) while
-at the same time there is still a complete stable filesystem structer in the
-other half of the system area.
-
-When finished with writing (a sync write is completed, the maximum sync leap
-time or a filesystem sync is requested), serial of the previously inactive
-superblock atomically is increased and the fs switches over to that - then
-stable declared - superblock.
-
-For all data outside the system area, blocks are just copied while writing.
-- 
cgit 


From 8979fc9a282441d086ead589528c711d9df3d94a Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:20 +0100
Subject: docs: filesystems: convert ramfs-rootfs-initramfs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Use notes markups;
- Add lists markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/89cbcc99a6371f3bff3ea1668fe497e8a15c226b.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst                |   1 +
 .../filesystems/ramfs-rootfs-initramfs.rst         | 369 +++++++++++++++++++++
 .../filesystems/ramfs-rootfs-initramfs.txt         | 359 --------------------
 3 files changed, 370 insertions(+), 359 deletions(-)
 create mode 100644 Documentation/filesystems/ramfs-rootfs-initramfs.rst
 delete mode 100644 Documentation/filesystems/ramfs-rootfs-initramfs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 08883a481a76..b8689d082911 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -83,5 +83,6 @@ Documentation for filesystem implementations.
    overlayfs
    proc
    qnx6
+   ramfs-rootfs-initramfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.rst b/Documentation/filesystems/ramfs-rootfs-initramfs.rst
new file mode 100644
index 000000000000..6c576e241d86
--- /dev/null
+++ b/Documentation/filesystems/ramfs-rootfs-initramfs.rst
@@ -0,0 +1,369 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================
+Ramfs, rootfs and initramfs
+===========================
+
+October 17, 2005
+
+Rob Landley <rob@landley.net>
+=============================
+
+What is ramfs?
+--------------
+
+Ramfs is a very simple filesystem that exports Linux's disk caching
+mechanisms (the page cache and dentry cache) as a dynamically resizable
+RAM-based filesystem.
+
+Normally all files are cached in memory by Linux.  Pages of data read from
+backing store (usually the block device the filesystem is mounted on) are kept
+around in case it's needed again, but marked as clean (freeable) in case the
+Virtual Memory system needs the memory for something else.  Similarly, data
+written to files is marked clean as soon as it has been written to backing
+store, but kept around for caching purposes until the VM reallocates the
+memory.  A similar mechanism (the dentry cache) greatly speeds up access to
+directories.
+
+With ramfs, there is no backing store.  Files written into ramfs allocate
+dentries and page cache as usual, but there's nowhere to write them to.
+This means the pages are never marked clean, so they can't be freed by the
+VM when it's looking to recycle memory.
+
+The amount of code required to implement ramfs is tiny, because all the
+work is done by the existing Linux caching infrastructure.  Basically,
+you're mounting the disk cache as a filesystem.  Because of this, ramfs is not
+an optional component removable via menuconfig, since there would be negligible
+space savings.
+
+ramfs and ramdisk:
+------------------
+
+The older "ram disk" mechanism created a synthetic block device out of
+an area of RAM and used it as backing store for a filesystem.  This block
+device was of fixed size, so the filesystem mounted on it was of fixed
+size.  Using a ram disk also required unnecessarily copying memory from the
+fake block device into the page cache (and copying changes back out), as well
+as creating and destroying dentries.  Plus it needed a filesystem driver
+(such as ext2) to format and interpret this data.
+
+Compared to ramfs, this wastes memory (and memory bus bandwidth), creates
+unnecessary work for the CPU, and pollutes the CPU caches.  (There are tricks
+to avoid this copying by playing with the page tables, but they're unpleasantly
+complicated and turn out to be about as expensive as the copying anyway.)
+More to the point, all the work ramfs is doing has to happen _anyway_,
+since all file access goes through the page and dentry caches.  The RAM
+disk is simply unnecessary; ramfs is internally much simpler.
+
+Another reason ramdisks are semi-obsolete is that the introduction of
+loopback devices offered a more flexible and convenient way to create
+synthetic block devices, now from files instead of from chunks of memory.
+See losetup (8) for details.
+
+ramfs and tmpfs:
+----------------
+
+One downside of ramfs is you can keep writing data into it until you fill
+up all memory, and the VM can't free it because the VM thinks that files
+should get written to backing store (rather than swap space), but ramfs hasn't
+got any backing store.  Because of this, only root (or a trusted user) should
+be allowed write access to a ramfs mount.
+
+A ramfs derivative called tmpfs was created to add size limits, and the ability
+to write the data to swap space.  Normal users can be allowed write access to
+tmpfs mounts.  See Documentation/filesystems/tmpfs.txt for more information.
+
+What is rootfs?
+---------------
+
+Rootfs is a special instance of ramfs (or tmpfs, if that's enabled), which is
+always present in 2.6 systems.  You can't unmount rootfs for approximately the
+same reason you can't kill the init process; rather than having special code
+to check for and handle an empty list, it's smaller and simpler for the kernel
+to just make sure certain lists can't become empty.
+
+Most systems just mount another filesystem over rootfs and ignore it.  The
+amount of space an empty instance of ramfs takes up is tiny.
+
+If CONFIG_TMPFS is enabled, rootfs will use tmpfs instead of ramfs by
+default.  To force ramfs, add "rootfstype=ramfs" to the kernel command
+line.
+
+What is initramfs?
+------------------
+
+All 2.6 Linux kernels contain a gzipped "cpio" format archive, which is
+extracted into rootfs when the kernel boots up.  After extracting, the kernel
+checks to see if rootfs contains a file "init", and if so it executes it as PID
+1.  If found, this init process is responsible for bringing the system the
+rest of the way up, including locating and mounting the real root device (if
+any).  If rootfs does not contain an init program after the embedded cpio
+archive is extracted into it, the kernel will fall through to the older code
+to locate and mount a root partition, then exec some variant of /sbin/init
+out of that.
+
+All this differs from the old initrd in several ways:
+
+  - The old initrd was always a separate file, while the initramfs archive is
+    linked into the linux kernel image.  (The directory ``linux-*/usr`` is
+    devoted to generating this archive during the build.)
+
+  - The old initrd file was a gzipped filesystem image (in some file format,
+    such as ext2, that needed a driver built into the kernel), while the new
+    initramfs archive is a gzipped cpio archive (like tar only simpler,
+    see cpio(1) and Documentation/driver-api/early-userspace/buffer-format.rst).
+    The kernel's cpio extraction code is not only extremely small, it's also
+    __init text and data that can be discarded during the boot process.
+
+  - The program run by the old initrd (which was called /initrd, not /init) did
+    some setup and then returned to the kernel, while the init program from
+    initramfs is not expected to return to the kernel.  (If /init needs to hand
+    off control it can overmount / with a new root device and exec another init
+    program.  See the switch_root utility, below.)
+
+  - When switching another root device, initrd would pivot_root and then
+    umount the ramdisk.  But initramfs is rootfs: you can neither pivot_root
+    rootfs, nor unmount it.  Instead delete everything out of rootfs to
+    free up the space (find -xdev / -exec rm '{}' ';'), overmount rootfs
+    with the new root (cd /newmount; mount --move . /; chroot .), attach
+    stdin/stdout/stderr to the new /dev/console, and exec the new init.
+
+    Since this is a remarkably persnickety process (and involves deleting
+    commands before you can run them), the klibc package introduced a helper
+    program (utils/run_init.c) to do all this for you.  Most other packages
+    (such as busybox) have named this command "switch_root".
+
+Populating initramfs:
+---------------------
+
+The 2.6 kernel build process always creates a gzipped cpio format initramfs
+archive and links it into the resulting kernel binary.  By default, this
+archive is empty (consuming 134 bytes on x86).
+
+The config option CONFIG_INITRAMFS_SOURCE (in General Setup in menuconfig,
+and living in usr/Kconfig) can be used to specify a source for the
+initramfs archive, which will automatically be incorporated into the
+resulting binary.  This option can point to an existing gzipped cpio
+archive, a directory containing files to be archived, or a text file
+specification such as the following example::
+
+  dir /dev 755 0 0
+  nod /dev/console 644 0 0 c 5 1
+  nod /dev/loop0 644 0 0 b 7 0
+  dir /bin 755 1000 1000
+  slink /bin/sh busybox 777 0 0
+  file /bin/busybox initramfs/busybox 755 0 0
+  dir /proc 755 0 0
+  dir /sys 755 0 0
+  dir /mnt 755 0 0
+  file /init initramfs/init.sh 755 0 0
+
+Run "usr/gen_init_cpio" (after the kernel build) to get a usage message
+documenting the above file format.
+
+One advantage of the configuration file is that root access is not required to
+set permissions or create device nodes in the new archive.  (Note that those
+two example "file" entries expect to find files named "init.sh" and "busybox" in
+a directory called "initramfs", under the linux-2.6.* directory.  See
+Documentation/driver-api/early-userspace/early_userspace_support.rst for more details.)
+
+The kernel does not depend on external cpio tools.  If you specify a
+directory instead of a configuration file, the kernel's build infrastructure
+creates a configuration file from that directory (usr/Makefile calls
+usr/gen_initramfs_list.sh), and proceeds to package up that directory
+using the config file (by feeding it to usr/gen_init_cpio, which is created
+from usr/gen_init_cpio.c).  The kernel's build-time cpio creation code is
+entirely self-contained, and the kernel's boot-time extractor is also
+(obviously) self-contained.
+
+The one thing you might need external cpio utilities installed for is creating
+or extracting your own preprepared cpio files to feed to the kernel build
+(instead of a config file or directory).
+
+The following command line can extract a cpio image (either by the above script
+or by the kernel build) back into its component files::
+
+  cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames
+
+The following shell script can create a prebuilt cpio archive you can
+use in place of the above config file::
+
+  #!/bin/sh
+
+  # Copyright 2006 Rob Landley <rob@landley.net> and TimeSys Corporation.
+  # Licensed under GPL version 2
+
+  if [ $# -ne 2 ]
+  then
+    echo "usage: mkinitramfs directory imagename.cpio.gz"
+    exit 1
+  fi
+
+  if [ -d "$1" ]
+  then
+    echo "creating $2 from $1"
+    (cd "$1"; find . | cpio -o -H newc | gzip) > "$2"
+  else
+    echo "First argument must be a directory"
+    exit 1
+  fi
+
+.. Note::
+
+   The cpio man page contains some bad advice that will break your initramfs
+   archive if you follow it.  It says "A typical way to generate the list
+   of filenames is with the find command; you should give find the -depth
+   option to minimize problems with permissions on directories that are
+   unwritable or not searchable."  Don't do this when creating
+   initramfs.cpio.gz images, it won't work.  The Linux kernel cpio extractor
+   won't create files in a directory that doesn't exist, so the directory
+   entries must go before the files that go in those directories.
+   The above script gets them in the right order.
+
+External initramfs images:
+--------------------------
+
+If the kernel has initrd support enabled, an external cpio.gz archive can also
+be passed into a 2.6 kernel in place of an initrd.  In this case, the kernel
+will autodetect the type (initramfs, not initrd) and extract the external cpio
+archive into rootfs before trying to run /init.
+
+This has the memory efficiency advantages of initramfs (no ramdisk block
+device) but the separate packaging of initrd (which is nice if you have
+non-GPL code you'd like to run from initramfs, without conflating it with
+the GPL licensed Linux kernel binary).
+
+It can also be used to supplement the kernel's built-in initramfs image.  The
+files in the external archive will overwrite any conflicting files in
+the built-in initramfs archive.  Some distributors also prefer to customize
+a single kernel image with task-specific initramfs images, without recompiling.
+
+Contents of initramfs:
+----------------------
+
+An initramfs archive is a complete self-contained root filesystem for Linux.
+If you don't already understand what shared libraries, devices, and paths
+you need to get a minimal root filesystem up and running, here are some
+references:
+
+- http://www.tldp.org/HOWTO/Bootdisk-HOWTO/
+- http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
+- http://www.linuxfromscratch.org/lfs/view/stable/
+
+The "klibc" package (http://www.kernel.org/pub/linux/libs/klibc) is
+designed to be a tiny C library to statically link early userspace
+code against, along with some related utilities.  It is BSD licensed.
+
+I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net)
+myself.  These are LGPL and GPL, respectively.  (A self-contained initramfs
+package is planned for the busybox 1.3 release.)
+
+In theory you could use glibc, but that's not well suited for small embedded
+uses like this.  (A "hello world" program statically linked against glibc is
+over 400k.  With uClibc it's 7k.  Also note that glibc dlopens libnss to do
+name lookups, even when otherwise statically linked.)
+
+A good first step is to get initramfs to run a statically linked "hello world"
+program as init, and test it under an emulator like qemu (www.qemu.org) or
+User Mode Linux, like so::
+
+  cat > hello.c << EOF
+  #include <stdio.h>
+  #include <unistd.h>
+
+  int main(int argc, char *argv[])
+  {
+    printf("Hello world!\n");
+    sleep(999999999);
+  }
+  EOF
+  gcc -static hello.c -o init
+  echo init | cpio -o -H newc | gzip > test.cpio.gz
+  # Testing external initramfs using the initrd loading mechanism.
+  qemu -kernel /boot/vmlinuz -initrd test.cpio.gz /dev/zero
+
+When debugging a normal root filesystem, it's nice to be able to boot with
+"init=/bin/sh".  The initramfs equivalent is "rdinit=/bin/sh", and it's
+just as useful.
+
+Why cpio rather than tar?
+-------------------------
+
+This decision was made back in December, 2001.  The discussion started here:
+
+  http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1538.html
+
+And spawned a second thread (specifically on tar vs cpio), starting here:
+
+  http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1587.html
+
+The quick and dirty summary version (which is no substitute for reading
+the above threads) is:
+
+1) cpio is a standard.  It's decades old (from the AT&T days), and already
+   widely used on Linux (inside RPM, Red Hat's device driver disks).  Here's
+   a Linux Journal article about it from 1996:
+
+      http://www.linuxjournal.com/article/1213
+
+   It's not as popular as tar because the traditional cpio command line tools
+   require _truly_hideous_ command line arguments.  But that says nothing
+   either way about the archive format, and there are alternative tools,
+   such as:
+
+     http://freecode.com/projects/afio
+
+2) The cpio archive format chosen by the kernel is simpler and cleaner (and
+   thus easier to create and parse) than any of the (literally dozens of)
+   various tar archive formats.  The complete initramfs archive format is
+   explained in buffer-format.txt, created in usr/gen_init_cpio.c, and
+   extracted in init/initramfs.c.  All three together come to less than 26k
+   total of human-readable text.
+
+3) The GNU project standardizing on tar is approximately as relevant as
+   Windows standardizing on zip.  Linux is not part of either, and is free
+   to make its own technical decisions.
+
+4) Since this is a kernel internal format, it could easily have been
+   something brand new.  The kernel provides its own tools to create and
+   extract this format anyway.  Using an existing standard was preferable,
+   but not essential.
+
+5) Al Viro made the decision (quote: "tar is ugly as hell and not going to be
+   supported on the kernel side"):
+
+      http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1540.html
+
+   explained his reasoning:
+
+     - http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html
+     - http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html
+
+   and, most importantly, designed and implemented the initramfs code.
+
+Future directions:
+------------------
+
+Today (2.6.16), initramfs is always compiled in, but not always used.  The
+kernel falls back to legacy boot code that is reached only if initramfs does
+not contain an /init program.  The fallback is legacy code, there to ensure a
+smooth transition and allowing early boot functionality to gradually move to
+"early userspace" (I.E. initramfs).
+
+The move to early userspace is necessary because finding and mounting the real
+root device is complex.  Root partitions can span multiple devices (raid or
+separate journal).  They can be out on the network (requiring dhcp, setting a
+specific MAC address, logging into a server, etc).  They can live on removable
+media, with dynamically allocated major/minor numbers and persistent naming
+issues requiring a full udev implementation to sort out.  They can be
+compressed, encrypted, copy-on-write, loopback mounted, strangely partitioned,
+and so on.
+
+This kind of complexity (which inevitably includes policy) is rightly handled
+in userspace.  Both klibc and busybox/uClibc are working on simple initramfs
+packages to drop into a kernel build.
+
+The klibc package has now been accepted into Andrew Morton's 2.6.17-mm tree.
+The kernel's current early boot code (partition detection, etc) will probably
+be migrated into a default initramfs, automatically created and used by the
+kernel build.
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
deleted file mode 100644
index 97d42ccaa92d..000000000000
--- a/Documentation/filesystems/ramfs-rootfs-initramfs.txt
+++ /dev/null
@@ -1,359 +0,0 @@
-ramfs, rootfs and initramfs
-October 17, 2005
-Rob Landley <rob@landley.net>
-=============================
-
-What is ramfs?
---------------
-
-Ramfs is a very simple filesystem that exports Linux's disk caching
-mechanisms (the page cache and dentry cache) as a dynamically resizable
-RAM-based filesystem.
-
-Normally all files are cached in memory by Linux.  Pages of data read from
-backing store (usually the block device the filesystem is mounted on) are kept
-around in case it's needed again, but marked as clean (freeable) in case the
-Virtual Memory system needs the memory for something else.  Similarly, data
-written to files is marked clean as soon as it has been written to backing
-store, but kept around for caching purposes until the VM reallocates the
-memory.  A similar mechanism (the dentry cache) greatly speeds up access to
-directories.
-
-With ramfs, there is no backing store.  Files written into ramfs allocate
-dentries and page cache as usual, but there's nowhere to write them to.
-This means the pages are never marked clean, so they can't be freed by the
-VM when it's looking to recycle memory.
-
-The amount of code required to implement ramfs is tiny, because all the
-work is done by the existing Linux caching infrastructure.  Basically,
-you're mounting the disk cache as a filesystem.  Because of this, ramfs is not
-an optional component removable via menuconfig, since there would be negligible
-space savings.
-
-ramfs and ramdisk:
-------------------
-
-The older "ram disk" mechanism created a synthetic block device out of
-an area of RAM and used it as backing store for a filesystem.  This block
-device was of fixed size, so the filesystem mounted on it was of fixed
-size.  Using a ram disk also required unnecessarily copying memory from the
-fake block device into the page cache (and copying changes back out), as well
-as creating and destroying dentries.  Plus it needed a filesystem driver
-(such as ext2) to format and interpret this data.
-
-Compared to ramfs, this wastes memory (and memory bus bandwidth), creates
-unnecessary work for the CPU, and pollutes the CPU caches.  (There are tricks
-to avoid this copying by playing with the page tables, but they're unpleasantly
-complicated and turn out to be about as expensive as the copying anyway.)
-More to the point, all the work ramfs is doing has to happen _anyway_,
-since all file access goes through the page and dentry caches.  The RAM
-disk is simply unnecessary; ramfs is internally much simpler.
-
-Another reason ramdisks are semi-obsolete is that the introduction of
-loopback devices offered a more flexible and convenient way to create
-synthetic block devices, now from files instead of from chunks of memory.
-See losetup (8) for details.
-
-ramfs and tmpfs:
-----------------
-
-One downside of ramfs is you can keep writing data into it until you fill
-up all memory, and the VM can't free it because the VM thinks that files
-should get written to backing store (rather than swap space), but ramfs hasn't
-got any backing store.  Because of this, only root (or a trusted user) should
-be allowed write access to a ramfs mount.
-
-A ramfs derivative called tmpfs was created to add size limits, and the ability
-to write the data to swap space.  Normal users can be allowed write access to
-tmpfs mounts.  See Documentation/filesystems/tmpfs.txt for more information.
-
-What is rootfs?
----------------
-
-Rootfs is a special instance of ramfs (or tmpfs, if that's enabled), which is
-always present in 2.6 systems.  You can't unmount rootfs for approximately the
-same reason you can't kill the init process; rather than having special code
-to check for and handle an empty list, it's smaller and simpler for the kernel
-to just make sure certain lists can't become empty.
-
-Most systems just mount another filesystem over rootfs and ignore it.  The
-amount of space an empty instance of ramfs takes up is tiny.
-
-If CONFIG_TMPFS is enabled, rootfs will use tmpfs instead of ramfs by
-default.  To force ramfs, add "rootfstype=ramfs" to the kernel command
-line.
-
-What is initramfs?
-------------------
-
-All 2.6 Linux kernels contain a gzipped "cpio" format archive, which is
-extracted into rootfs when the kernel boots up.  After extracting, the kernel
-checks to see if rootfs contains a file "init", and if so it executes it as PID
-1.  If found, this init process is responsible for bringing the system the
-rest of the way up, including locating and mounting the real root device (if
-any).  If rootfs does not contain an init program after the embedded cpio
-archive is extracted into it, the kernel will fall through to the older code
-to locate and mount a root partition, then exec some variant of /sbin/init
-out of that.
-
-All this differs from the old initrd in several ways:
-
-  - The old initrd was always a separate file, while the initramfs archive is
-    linked into the linux kernel image.  (The directory linux-*/usr is devoted
-    to generating this archive during the build.)
-
-  - The old initrd file was a gzipped filesystem image (in some file format,
-    such as ext2, that needed a driver built into the kernel), while the new
-    initramfs archive is a gzipped cpio archive (like tar only simpler,
-    see cpio(1) and Documentation/driver-api/early-userspace/buffer-format.rst).  The
-    kernel's cpio extraction code is not only extremely small, it's also
-    __init text and data that can be discarded during the boot process.
-
-  - The program run by the old initrd (which was called /initrd, not /init) did
-    some setup and then returned to the kernel, while the init program from
-    initramfs is not expected to return to the kernel.  (If /init needs to hand
-    off control it can overmount / with a new root device and exec another init
-    program.  See the switch_root utility, below.)
-
-  - When switching another root device, initrd would pivot_root and then
-    umount the ramdisk.  But initramfs is rootfs: you can neither pivot_root
-    rootfs, nor unmount it.  Instead delete everything out of rootfs to
-    free up the space (find -xdev / -exec rm '{}' ';'), overmount rootfs
-    with the new root (cd /newmount; mount --move . /; chroot .), attach
-    stdin/stdout/stderr to the new /dev/console, and exec the new init.
-
-    Since this is a remarkably persnickety process (and involves deleting
-    commands before you can run them), the klibc package introduced a helper
-    program (utils/run_init.c) to do all this for you.  Most other packages
-    (such as busybox) have named this command "switch_root".
-
-Populating initramfs:
----------------------
-
-The 2.6 kernel build process always creates a gzipped cpio format initramfs
-archive and links it into the resulting kernel binary.  By default, this
-archive is empty (consuming 134 bytes on x86).
-
-The config option CONFIG_INITRAMFS_SOURCE (in General Setup in menuconfig,
-and living in usr/Kconfig) can be used to specify a source for the
-initramfs archive, which will automatically be incorporated into the
-resulting binary.  This option can point to an existing gzipped cpio
-archive, a directory containing files to be archived, or a text file
-specification such as the following example:
-
-  dir /dev 755 0 0
-  nod /dev/console 644 0 0 c 5 1
-  nod /dev/loop0 644 0 0 b 7 0
-  dir /bin 755 1000 1000
-  slink /bin/sh busybox 777 0 0
-  file /bin/busybox initramfs/busybox 755 0 0
-  dir /proc 755 0 0
-  dir /sys 755 0 0
-  dir /mnt 755 0 0
-  file /init initramfs/init.sh 755 0 0
-
-Run "usr/gen_init_cpio" (after the kernel build) to get a usage message
-documenting the above file format.
-
-One advantage of the configuration file is that root access is not required to
-set permissions or create device nodes in the new archive.  (Note that those
-two example "file" entries expect to find files named "init.sh" and "busybox" in
-a directory called "initramfs", under the linux-2.6.* directory.  See
-Documentation/driver-api/early-userspace/early_userspace_support.rst for more details.)
-
-The kernel does not depend on external cpio tools.  If you specify a
-directory instead of a configuration file, the kernel's build infrastructure
-creates a configuration file from that directory (usr/Makefile calls
-usr/gen_initramfs_list.sh), and proceeds to package up that directory
-using the config file (by feeding it to usr/gen_init_cpio, which is created
-from usr/gen_init_cpio.c).  The kernel's build-time cpio creation code is
-entirely self-contained, and the kernel's boot-time extractor is also
-(obviously) self-contained.
-
-The one thing you might need external cpio utilities installed for is creating
-or extracting your own preprepared cpio files to feed to the kernel build
-(instead of a config file or directory).
-
-The following command line can extract a cpio image (either by the above script
-or by the kernel build) back into its component files:
-
-  cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames
-
-The following shell script can create a prebuilt cpio archive you can
-use in place of the above config file:
-
-  #!/bin/sh
-
-  # Copyright 2006 Rob Landley <rob@landley.net> and TimeSys Corporation.
-  # Licensed under GPL version 2
-
-  if [ $# -ne 2 ]
-  then
-    echo "usage: mkinitramfs directory imagename.cpio.gz"
-    exit 1
-  fi
-
-  if [ -d "$1" ]
-  then
-    echo "creating $2 from $1"
-    (cd "$1"; find . | cpio -o -H newc | gzip) > "$2"
-  else
-    echo "First argument must be a directory"
-    exit 1
-  fi
-
-Note: The cpio man page contains some bad advice that will break your initramfs
-archive if you follow it.  It says "A typical way to generate the list
-of filenames is with the find command; you should give find the -depth option
-to minimize problems with permissions on directories that are unwritable or not
-searchable."  Don't do this when creating initramfs.cpio.gz images, it won't
-work.  The Linux kernel cpio extractor won't create files in a directory that
-doesn't exist, so the directory entries must go before the files that go in
-those directories.  The above script gets them in the right order.
-
-External initramfs images:
---------------------------
-
-If the kernel has initrd support enabled, an external cpio.gz archive can also
-be passed into a 2.6 kernel in place of an initrd.  In this case, the kernel
-will autodetect the type (initramfs, not initrd) and extract the external cpio
-archive into rootfs before trying to run /init.
-
-This has the memory efficiency advantages of initramfs (no ramdisk block
-device) but the separate packaging of initrd (which is nice if you have
-non-GPL code you'd like to run from initramfs, without conflating it with
-the GPL licensed Linux kernel binary).
-
-It can also be used to supplement the kernel's built-in initramfs image.  The
-files in the external archive will overwrite any conflicting files in
-the built-in initramfs archive.  Some distributors also prefer to customize
-a single kernel image with task-specific initramfs images, without recompiling.
-
-Contents of initramfs:
-----------------------
-
-An initramfs archive is a complete self-contained root filesystem for Linux.
-If you don't already understand what shared libraries, devices, and paths
-you need to get a minimal root filesystem up and running, here are some
-references:
-http://www.tldp.org/HOWTO/Bootdisk-HOWTO/
-http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html
-http://www.linuxfromscratch.org/lfs/view/stable/
-
-The "klibc" package (http://www.kernel.org/pub/linux/libs/klibc) is
-designed to be a tiny C library to statically link early userspace
-code against, along with some related utilities.  It is BSD licensed.
-
-I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net)
-myself.  These are LGPL and GPL, respectively.  (A self-contained initramfs
-package is planned for the busybox 1.3 release.)
-
-In theory you could use glibc, but that's not well suited for small embedded
-uses like this.  (A "hello world" program statically linked against glibc is
-over 400k.  With uClibc it's 7k.  Also note that glibc dlopens libnss to do
-name lookups, even when otherwise statically linked.)
-
-A good first step is to get initramfs to run a statically linked "hello world"
-program as init, and test it under an emulator like qemu (www.qemu.org) or
-User Mode Linux, like so:
-
-  cat > hello.c << EOF
-  #include <stdio.h>
-  #include <unistd.h>
-
-  int main(int argc, char *argv[])
-  {
-    printf("Hello world!\n");
-    sleep(999999999);
-  }
-  EOF
-  gcc -static hello.c -o init
-  echo init | cpio -o -H newc | gzip > test.cpio.gz
-  # Testing external initramfs using the initrd loading mechanism.
-  qemu -kernel /boot/vmlinuz -initrd test.cpio.gz /dev/zero
-
-When debugging a normal root filesystem, it's nice to be able to boot with
-"init=/bin/sh".  The initramfs equivalent is "rdinit=/bin/sh", and it's
-just as useful.
-
-Why cpio rather than tar?
--------------------------
-
-This decision was made back in December, 2001.  The discussion started here:
-
-  http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1538.html
-
-And spawned a second thread (specifically on tar vs cpio), starting here:
-
-  http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1587.html
-
-The quick and dirty summary version (which is no substitute for reading
-the above threads) is:
-
-1) cpio is a standard.  It's decades old (from the AT&T days), and already
-   widely used on Linux (inside RPM, Red Hat's device driver disks).  Here's
-   a Linux Journal article about it from 1996:
-
-      http://www.linuxjournal.com/article/1213
-
-   It's not as popular as tar because the traditional cpio command line tools
-   require _truly_hideous_ command line arguments.  But that says nothing
-   either way about the archive format, and there are alternative tools,
-   such as:
-
-     http://freecode.com/projects/afio
-
-2) The cpio archive format chosen by the kernel is simpler and cleaner (and
-   thus easier to create and parse) than any of the (literally dozens of)
-   various tar archive formats.  The complete initramfs archive format is
-   explained in buffer-format.txt, created in usr/gen_init_cpio.c, and
-   extracted in init/initramfs.c.  All three together come to less than 26k
-   total of human-readable text.
-
-3) The GNU project standardizing on tar is approximately as relevant as
-   Windows standardizing on zip.  Linux is not part of either, and is free
-   to make its own technical decisions.
-
-4) Since this is a kernel internal format, it could easily have been
-   something brand new.  The kernel provides its own tools to create and
-   extract this format anyway.  Using an existing standard was preferable,
-   but not essential.
-
-5) Al Viro made the decision (quote: "tar is ugly as hell and not going to be
-   supported on the kernel side"):
-
-      http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1540.html
-
-   explained his reasoning:
-
-      http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html
-      http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html
-
-   and, most importantly, designed and implemented the initramfs code.
-
-Future directions:
-------------------
-
-Today (2.6.16), initramfs is always compiled in, but not always used.  The
-kernel falls back to legacy boot code that is reached only if initramfs does
-not contain an /init program.  The fallback is legacy code, there to ensure a
-smooth transition and allowing early boot functionality to gradually move to
-"early userspace" (I.E. initramfs).
-
-The move to early userspace is necessary because finding and mounting the real
-root device is complex.  Root partitions can span multiple devices (raid or
-separate journal).  They can be out on the network (requiring dhcp, setting a
-specific MAC address, logging into a server, etc).  They can live on removable
-media, with dynamically allocated major/minor numbers and persistent naming
-issues requiring a full udev implementation to sort out.  They can be
-compressed, encrypted, copy-on-write, loopback mounted, strangely partitioned,
-and so on.
-
-This kind of complexity (which inevitably includes policy) is rightly handled
-in userspace.  Both klibc and busybox/uClibc are working on simple initramfs
-packages to drop into a kernel build.
-
-The klibc package has now been accepted into Andrew Morton's 2.6.17-mm tree.
-The kernel's current early boot code (partition detection, etc) will probably
-be migrated into a default initramfs, automatically created and used by the
-kernel build.
-- 
cgit 


From 56e6d5c0eb7b862b4c984107e665821722413008 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:21 +0100
Subject: docs: filesystems: convert relay.txt to ReST

- Add a SPDX header;
- Adjust document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Use notes markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/f48bb0fdf64d197f28c6f469adb61a7a091adb75.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/relay.rst | 501 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/relay.txt | 494 -----------------------------------
 3 files changed, 502 insertions(+), 494 deletions(-)
 create mode 100644 Documentation/filesystems/relay.rst
 delete mode 100644 Documentation/filesystems/relay.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index b8689d082911..0aade8146d4d 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -84,5 +84,6 @@ Documentation for filesystem implementations.
    proc
    qnx6
    ramfs-rootfs-initramfs
+   relay
    virtiofs
    vfat
diff --git a/Documentation/filesystems/relay.rst b/Documentation/filesystems/relay.rst
new file mode 100644
index 000000000000..04ad083cfe62
--- /dev/null
+++ b/Documentation/filesystems/relay.rst
@@ -0,0 +1,501 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
+relay interface (formerly relayfs)
+==================================
+
+The relay interface provides a means for kernel applications to
+efficiently log and transfer large quantities of data from the kernel
+to userspace via user-defined 'relay channels'.
+
+A 'relay channel' is a kernel->user data relay mechanism implemented
+as a set of per-cpu kernel buffers ('channel buffers'), each
+represented as a regular file ('relay file') in user space.  Kernel
+clients write into the channel buffers using efficient write
+functions; these automatically log into the current cpu's channel
+buffer.  User space applications mmap() or read() from the relay files
+and retrieve the data as it becomes available.  The relay files
+themselves are files created in a host filesystem, e.g. debugfs, and
+are associated with the channel buffers using the API described below.
+
+The format of the data logged into the channel buffers is completely
+up to the kernel client; the relay interface does however provide
+hooks which allow kernel clients to impose some structure on the
+buffer data.  The relay interface doesn't implement any form of data
+filtering - this also is left to the kernel client.  The purpose is to
+keep things as simple as possible.
+
+This document provides an overview of the relay interface API.  The
+details of the function parameters are documented along with the
+functions in the relay interface code - please see that for details.
+
+Semantics
+=========
+
+Each relay channel has one buffer per CPU, each buffer has one or more
+sub-buffers.  Messages are written to the first sub-buffer until it is
+too full to contain a new message, in which case it is written to
+the next (if available).  Messages are never split across sub-buffers.
+At this point, userspace can be notified so it empties the first
+sub-buffer, while the kernel continues writing to the next.
+
+When notified that a sub-buffer is full, the kernel knows how many
+bytes of it are padding i.e. unused space occurring because a complete
+message couldn't fit into a sub-buffer.  Userspace can use this
+knowledge to copy only valid data.
+
+After copying it, userspace can notify the kernel that a sub-buffer
+has been consumed.
+
+A relay channel can operate in a mode where it will overwrite data not
+yet collected by userspace, and not wait for it to be consumed.
+
+The relay channel itself does not provide for communication of such
+data between userspace and kernel, allowing the kernel side to remain
+simple and not impose a single interface on userspace.  It does
+provide a set of examples and a separate helper though, described
+below.
+
+The read() interface both removes padding and internally consumes the
+read sub-buffers; thus in cases where read(2) is being used to drain
+the channel buffers, special-purpose communication between kernel and
+user isn't necessary for basic operation.
+
+One of the major goals of the relay interface is to provide a low
+overhead mechanism for conveying kernel data to userspace.  While the
+read() interface is easy to use, it's not as efficient as the mmap()
+approach; the example code attempts to make the tradeoff between the
+two approaches as small as possible.
+
+klog and relay-apps example code
+================================
+
+The relay interface itself is ready to use, but to make things easier,
+a couple simple utility functions and a set of examples are provided.
+
+The relay-apps example tarball, available on the relay sourceforge
+site, contains a set of self-contained examples, each consisting of a
+pair of .c files containing boilerplate code for each of the user and
+kernel sides of a relay application.  When combined these two sets of
+boilerplate code provide glue to easily stream data to disk, without
+having to bother with mundane housekeeping chores.
+
+The 'klog debugging functions' patch (klog.patch in the relay-apps
+tarball) provides a couple of high-level logging functions to the
+kernel which allow writing formatted text or raw data to a channel,
+regardless of whether a channel to write into exists or not, or even
+whether the relay interface is compiled into the kernel or not.  These
+functions allow you to put unconditional 'trace' statements anywhere
+in the kernel or kernel modules; only when there is a 'klog handler'
+registered will data actually be logged (see the klog and kleak
+examples for details).
+
+It is of course possible to use the relay interface from scratch,
+i.e. without using any of the relay-apps example code or klog, but
+you'll have to implement communication between userspace and kernel,
+allowing both to convey the state of buffers (full, empty, amount of
+padding).  The read() interface both removes padding and internally
+consumes the read sub-buffers; thus in cases where read(2) is being
+used to drain the channel buffers, special-purpose communication
+between kernel and user isn't necessary for basic operation.  Things
+such as buffer-full conditions would still need to be communicated via
+some channel though.
+
+klog and the relay-apps examples can be found in the relay-apps
+tarball on http://relayfs.sourceforge.net
+
+The relay interface user space API
+==================================
+
+The relay interface implements basic file operations for user space
+access to relay channel buffer data.  Here are the file operations
+that are available and some comments regarding their behavior:
+
+=========== ============================================================
+open()	    enables user to open an _existing_ channel buffer.
+
+mmap()      results in channel buffer being mapped into the caller's
+	    memory space. Note that you can't do a partial mmap - you
+	    must map the entire file, which is NRBUF * SUBBUFSIZE.
+
+read()      read the contents of a channel buffer.  The bytes read are
+	    'consumed' by the reader, i.e. they won't be available
+	    again to subsequent reads.  If the channel is being used
+	    in no-overwrite mode (the default), it can be read at any
+	    time even if there's an active kernel writer.  If the
+	    channel is being used in overwrite mode and there are
+	    active channel writers, results may be unpredictable -
+	    users should make sure that all logging to the channel has
+	    ended before using read() with overwrite mode.  Sub-buffer
+	    padding is automatically removed and will not be seen by
+	    the reader.
+
+sendfile()  transfer data from a channel buffer to an output file
+	    descriptor. Sub-buffer padding is automatically removed
+	    and will not be seen by the reader.
+
+poll()      POLLIN/POLLRDNORM/POLLERR supported.  User applications are
+	    notified when sub-buffer boundaries are crossed.
+
+close()     decrements the channel buffer's refcount.  When the refcount
+	    reaches 0, i.e. when no process or kernel client has the
+	    buffer open, the channel buffer is freed.
+=========== ============================================================
+
+In order for a user application to make use of relay files, the
+host filesystem must be mounted.  For example::
+
+	mount -t debugfs debugfs /sys/kernel/debug
+
+.. Note::
+
+	the host filesystem doesn't need to be mounted for kernel
+	clients to create or use channels - it only needs to be
+	mounted when user space applications need access to the buffer
+	data.
+
+
+The relay interface kernel API
+==============================
+
+Here's a summary of the API the relay interface provides to in-kernel clients:
+
+TBD(curr. line MT:/API/)
+  channel management functions::
+
+    relay_open(base_filename, parent, subbuf_size, n_subbufs,
+               callbacks, private_data)
+    relay_close(chan)
+    relay_flush(chan)
+    relay_reset(chan)
+
+  channel management typically called on instigation of userspace::
+
+    relay_subbufs_consumed(chan, cpu, subbufs_consumed)
+
+  write functions::
+
+    relay_write(chan, data, length)
+    __relay_write(chan, data, length)
+    relay_reserve(chan, length)
+
+  callbacks::
+
+    subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
+    buf_mapped(buf, filp)
+    buf_unmapped(buf, filp)
+    create_buf_file(filename, parent, mode, buf, is_global)
+    remove_buf_file(dentry)
+
+  helper functions::
+
+    relay_buf_full(buf)
+    subbuf_start_reserve(buf, length)
+
+
+Creating a channel
+------------------
+
+relay_open() is used to create a channel, along with its per-cpu
+channel buffers.  Each channel buffer will have an associated file
+created for it in the host filesystem, which can be and mmapped or
+read from in user space.  The files are named basename0...basenameN-1
+where N is the number of online cpus, and by default will be created
+in the root of the filesystem (if the parent param is NULL).  If you
+want a directory structure to contain your relay files, you should
+create it using the host filesystem's directory creation function,
+e.g. debugfs_create_dir(), and pass the parent directory to
+relay_open().  Users are responsible for cleaning up any directory
+structure they create, when the channel is closed - again the host
+filesystem's directory removal functions should be used for that,
+e.g. debugfs_remove().
+
+In order for a channel to be created and the host filesystem's files
+associated with its channel buffers, the user must provide definitions
+for two callback functions, create_buf_file() and remove_buf_file().
+create_buf_file() is called once for each per-cpu buffer from
+relay_open() and allows the user to create the file which will be used
+to represent the corresponding channel buffer.  The callback should
+return the dentry of the file created to represent the channel buffer.
+remove_buf_file() must also be defined; it's responsible for deleting
+the file(s) created in create_buf_file() and is called during
+relay_close().
+
+Here are some typical definitions for these callbacks, in this case
+using debugfs::
+
+    /*
+    * create_buf_file() callback.  Creates relay file in debugfs.
+    */
+    static struct dentry *create_buf_file_handler(const char *filename,
+						struct dentry *parent,
+						umode_t mode,
+						struct rchan_buf *buf,
+						int *is_global)
+    {
+	    return debugfs_create_file(filename, mode, parent, buf,
+				    &relay_file_operations);
+    }
+
+    /*
+    * remove_buf_file() callback.  Removes relay file from debugfs.
+    */
+    static int remove_buf_file_handler(struct dentry *dentry)
+    {
+	    debugfs_remove(dentry);
+
+	    return 0;
+    }
+
+    /*
+    * relay interface callbacks
+    */
+    static struct rchan_callbacks relay_callbacks =
+    {
+	    .create_buf_file = create_buf_file_handler,
+	    .remove_buf_file = remove_buf_file_handler,
+    };
+
+And an example relay_open() invocation using them::
+
+  chan = relay_open("cpu", NULL, SUBBUF_SIZE, N_SUBBUFS, &relay_callbacks, NULL);
+
+If the create_buf_file() callback fails, or isn't defined, channel
+creation and thus relay_open() will fail.
+
+The total size of each per-cpu buffer is calculated by multiplying the
+number of sub-buffers by the sub-buffer size passed into relay_open().
+The idea behind sub-buffers is that they're basically an extension of
+double-buffering to N buffers, and they also allow applications to
+easily implement random-access-on-buffer-boundary schemes, which can
+be important for some high-volume applications.  The number and size
+of sub-buffers is completely dependent on the application and even for
+the same application, different conditions will warrant different
+values for these parameters at different times.  Typically, the right
+values to use are best decided after some experimentation; in general,
+though, it's safe to assume that having only 1 sub-buffer is a bad
+idea - you're guaranteed to either overwrite data or lose events
+depending on the channel mode being used.
+
+The create_buf_file() implementation can also be defined in such a way
+as to allow the creation of a single 'global' buffer instead of the
+default per-cpu set.  This can be useful for applications interested
+mainly in seeing the relative ordering of system-wide events without
+the need to bother with saving explicit timestamps for the purpose of
+merging/sorting per-cpu files in a postprocessing step.
+
+To have relay_open() create a global buffer, the create_buf_file()
+implementation should set the value of the is_global outparam to a
+non-zero value in addition to creating the file that will be used to
+represent the single buffer.  In the case of a global buffer,
+create_buf_file() and remove_buf_file() will be called only once.  The
+normal channel-writing functions, e.g. relay_write(), can still be
+used - writes from any cpu will transparently end up in the global
+buffer - but since it is a global buffer, callers should make sure
+they use the proper locking for such a buffer, either by wrapping
+writes in a spinlock, or by copying a write function from relay.h and
+creating a local version that internally does the proper locking.
+
+The private_data passed into relay_open() allows clients to associate
+user-defined data with a channel, and is immediately available
+(including in create_buf_file()) via chan->private_data or
+buf->chan->private_data.
+
+Buffer-only channels
+--------------------
+
+These channels have no files associated and can be created with
+relay_open(NULL, NULL, ...). Such channels are useful in scenarios such
+as when doing early tracing in the kernel, before the VFS is up. In these
+cases, one may open a buffer-only channel and then call
+relay_late_setup_files() when the kernel is ready to handle files,
+to expose the buffered data to the userspace.
+
+Channel 'modes'
+---------------
+
+relay channels can be used in either of two modes - 'overwrite' or
+'no-overwrite'.  The mode is entirely determined by the implementation
+of the subbuf_start() callback, as described below.  The default if no
+subbuf_start() callback is defined is 'no-overwrite' mode.  If the
+default mode suits your needs, and you plan to use the read()
+interface to retrieve channel data, you can ignore the details of this
+section, as it pertains mainly to mmap() implementations.
+
+In 'overwrite' mode, also known as 'flight recorder' mode, writes
+continuously cycle around the buffer and will never fail, but will
+unconditionally overwrite old data regardless of whether it's actually
+been consumed.  In no-overwrite mode, writes will fail, i.e. data will
+be lost, if the number of unconsumed sub-buffers equals the total
+number of sub-buffers in the channel.  It should be clear that if
+there is no consumer or if the consumer can't consume sub-buffers fast
+enough, data will be lost in either case; the only difference is
+whether data is lost from the beginning or the end of a buffer.
+
+As explained above, a relay channel is made of up one or more
+per-cpu channel buffers, each implemented as a circular buffer
+subdivided into one or more sub-buffers.  Messages are written into
+the current sub-buffer of the channel's current per-cpu buffer via the
+write functions described below.  Whenever a message can't fit into
+the current sub-buffer, because there's no room left for it, the
+client is notified via the subbuf_start() callback that a switch to a
+new sub-buffer is about to occur.  The client uses this callback to 1)
+initialize the next sub-buffer if appropriate 2) finalize the previous
+sub-buffer if appropriate and 3) return a boolean value indicating
+whether or not to actually move on to the next sub-buffer.
+
+To implement 'no-overwrite' mode, the userspace client would provide
+an implementation of the subbuf_start() callback something like the
+following::
+
+    static int subbuf_start(struct rchan_buf *buf,
+			    void *subbuf,
+			    void *prev_subbuf,
+			    unsigned int prev_padding)
+    {
+	    if (prev_subbuf)
+		    *((unsigned *)prev_subbuf) = prev_padding;
+
+	    if (relay_buf_full(buf))
+		    return 0;
+
+	    subbuf_start_reserve(buf, sizeof(unsigned int));
+
+	    return 1;
+    }
+
+If the current buffer is full, i.e. all sub-buffers remain unconsumed,
+the callback returns 0 to indicate that the buffer switch should not
+occur yet, i.e. until the consumer has had a chance to read the
+current set of ready sub-buffers.  For the relay_buf_full() function
+to make sense, the consumer is responsible for notifying the relay
+interface when sub-buffers have been consumed via
+relay_subbufs_consumed().  Any subsequent attempts to write into the
+buffer will again invoke the subbuf_start() callback with the same
+parameters; only when the consumer has consumed one or more of the
+ready sub-buffers will relay_buf_full() return 0, in which case the
+buffer switch can continue.
+
+The implementation of the subbuf_start() callback for 'overwrite' mode
+would be very similar::
+
+    static int subbuf_start(struct rchan_buf *buf,
+			    void *subbuf,
+			    void *prev_subbuf,
+			    size_t prev_padding)
+    {
+	    if (prev_subbuf)
+		    *((unsigned *)prev_subbuf) = prev_padding;
+
+	    subbuf_start_reserve(buf, sizeof(unsigned int));
+
+	    return 1;
+    }
+
+In this case, the relay_buf_full() check is meaningless and the
+callback always returns 1, causing the buffer switch to occur
+unconditionally.  It's also meaningless for the client to use the
+relay_subbufs_consumed() function in this mode, as it's never
+consulted.
+
+The default subbuf_start() implementation, used if the client doesn't
+define any callbacks, or doesn't define the subbuf_start() callback,
+implements the simplest possible 'no-overwrite' mode, i.e. it does
+nothing but return 0.
+
+Header information can be reserved at the beginning of each sub-buffer
+by calling the subbuf_start_reserve() helper function from within the
+subbuf_start() callback.  This reserved area can be used to store
+whatever information the client wants.  In the example above, room is
+reserved in each sub-buffer to store the padding count for that
+sub-buffer.  This is filled in for the previous sub-buffer in the
+subbuf_start() implementation; the padding value for the previous
+sub-buffer is passed into the subbuf_start() callback along with a
+pointer to the previous sub-buffer, since the padding value isn't
+known until a sub-buffer is filled.  The subbuf_start() callback is
+also called for the first sub-buffer when the channel is opened, to
+give the client a chance to reserve space in it.  In this case the
+previous sub-buffer pointer passed into the callback will be NULL, so
+the client should check the value of the prev_subbuf pointer before
+writing into the previous sub-buffer.
+
+Writing to a channel
+--------------------
+
+Kernel clients write data into the current cpu's channel buffer using
+relay_write() or __relay_write().  relay_write() is the main logging
+function - it uses local_irqsave() to protect the buffer and should be
+used if you might be logging from interrupt context.  If you know
+you'll never be logging from interrupt context, you can use
+__relay_write(), which only disables preemption.  These functions
+don't return a value, so you can't determine whether or not they
+failed - the assumption is that you wouldn't want to check a return
+value in the fast logging path anyway, and that they'll always succeed
+unless the buffer is full and no-overwrite mode is being used, in
+which case you can detect a failed write in the subbuf_start()
+callback by calling the relay_buf_full() helper function.
+
+relay_reserve() is used to reserve a slot in a channel buffer which
+can be written to later.  This would typically be used in applications
+that need to write directly into a channel buffer without having to
+stage data in a temporary buffer beforehand.  Because the actual write
+may not happen immediately after the slot is reserved, applications
+using relay_reserve() can keep a count of the number of bytes actually
+written, either in space reserved in the sub-buffers themselves or as
+a separate array.  See the 'reserve' example in the relay-apps tarball
+at http://relayfs.sourceforge.net for an example of how this can be
+done.  Because the write is under control of the client and is
+separated from the reserve, relay_reserve() doesn't protect the buffer
+at all - it's up to the client to provide the appropriate
+synchronization when using relay_reserve().
+
+Closing a channel
+-----------------
+
+The client calls relay_close() when it's finished using the channel.
+The channel and its associated buffers are destroyed when there are no
+longer any references to any of the channel buffers.  relay_flush()
+forces a sub-buffer switch on all the channel buffers, and can be used
+to finalize and process the last sub-buffers before the channel is
+closed.
+
+Misc
+----
+
+Some applications may want to keep a channel around and re-use it
+rather than open and close a new channel for each use.  relay_reset()
+can be used for this purpose - it resets a channel to its initial
+state without reallocating channel buffer memory or destroying
+existing mappings.  It should however only be called when it's safe to
+do so, i.e. when the channel isn't currently being written to.
+
+Finally, there are a couple of utility callbacks that can be used for
+different purposes.  buf_mapped() is called whenever a channel buffer
+is mmapped from user space and buf_unmapped() is called when it's
+unmapped.  The client can use this notification to trigger actions
+within the kernel application, such as enabling/disabling logging to
+the channel.
+
+
+Resources
+=========
+
+For news, example code, mailing list, etc. see the relay interface homepage:
+
+    http://relayfs.sourceforge.net
+
+
+Credits
+=======
+
+The ideas and specs for the relay interface came about as a result of
+discussions on tracing involving the following:
+
+Michel Dagenais		<michel.dagenais@polymtl.ca>
+Richard Moore		<richardj_moore@uk.ibm.com>
+Bob Wisniewski		<bob@watson.ibm.com>
+Karim Yaghmour		<karim@opersys.com>
+Tom Zanussi		<zanussi@us.ibm.com>
+
+Also thanks to Hubertus Franke for a lot of useful suggestions and bug
+reports.
diff --git a/Documentation/filesystems/relay.txt b/Documentation/filesystems/relay.txt
deleted file mode 100644
index cd709a94d054..000000000000
--- a/Documentation/filesystems/relay.txt
+++ /dev/null
@@ -1,494 +0,0 @@
-relay interface (formerly relayfs)
-==================================
-
-The relay interface provides a means for kernel applications to
-efficiently log and transfer large quantities of data from the kernel
-to userspace via user-defined 'relay channels'.
-
-A 'relay channel' is a kernel->user data relay mechanism implemented
-as a set of per-cpu kernel buffers ('channel buffers'), each
-represented as a regular file ('relay file') in user space.  Kernel
-clients write into the channel buffers using efficient write
-functions; these automatically log into the current cpu's channel
-buffer.  User space applications mmap() or read() from the relay files
-and retrieve the data as it becomes available.  The relay files
-themselves are files created in a host filesystem, e.g. debugfs, and
-are associated with the channel buffers using the API described below.
-
-The format of the data logged into the channel buffers is completely
-up to the kernel client; the relay interface does however provide
-hooks which allow kernel clients to impose some structure on the
-buffer data.  The relay interface doesn't implement any form of data
-filtering - this also is left to the kernel client.  The purpose is to
-keep things as simple as possible.
-
-This document provides an overview of the relay interface API.  The
-details of the function parameters are documented along with the
-functions in the relay interface code - please see that for details.
-
-Semantics
-=========
-
-Each relay channel has one buffer per CPU, each buffer has one or more
-sub-buffers.  Messages are written to the first sub-buffer until it is
-too full to contain a new message, in which case it is written to
-the next (if available).  Messages are never split across sub-buffers.
-At this point, userspace can be notified so it empties the first
-sub-buffer, while the kernel continues writing to the next.
-
-When notified that a sub-buffer is full, the kernel knows how many
-bytes of it are padding i.e. unused space occurring because a complete
-message couldn't fit into a sub-buffer.  Userspace can use this
-knowledge to copy only valid data.
-
-After copying it, userspace can notify the kernel that a sub-buffer
-has been consumed.
-
-A relay channel can operate in a mode where it will overwrite data not
-yet collected by userspace, and not wait for it to be consumed.
-
-The relay channel itself does not provide for communication of such
-data between userspace and kernel, allowing the kernel side to remain
-simple and not impose a single interface on userspace.  It does
-provide a set of examples and a separate helper though, described
-below.
-
-The read() interface both removes padding and internally consumes the
-read sub-buffers; thus in cases where read(2) is being used to drain
-the channel buffers, special-purpose communication between kernel and
-user isn't necessary for basic operation.
-
-One of the major goals of the relay interface is to provide a low
-overhead mechanism for conveying kernel data to userspace.  While the
-read() interface is easy to use, it's not as efficient as the mmap()
-approach; the example code attempts to make the tradeoff between the
-two approaches as small as possible.
-
-klog and relay-apps example code
-================================
-
-The relay interface itself is ready to use, but to make things easier,
-a couple simple utility functions and a set of examples are provided.
-
-The relay-apps example tarball, available on the relay sourceforge
-site, contains a set of self-contained examples, each consisting of a
-pair of .c files containing boilerplate code for each of the user and
-kernel sides of a relay application.  When combined these two sets of
-boilerplate code provide glue to easily stream data to disk, without
-having to bother with mundane housekeeping chores.
-
-The 'klog debugging functions' patch (klog.patch in the relay-apps
-tarball) provides a couple of high-level logging functions to the
-kernel which allow writing formatted text or raw data to a channel,
-regardless of whether a channel to write into exists or not, or even
-whether the relay interface is compiled into the kernel or not.  These
-functions allow you to put unconditional 'trace' statements anywhere
-in the kernel or kernel modules; only when there is a 'klog handler'
-registered will data actually be logged (see the klog and kleak
-examples for details).
-
-It is of course possible to use the relay interface from scratch,
-i.e. without using any of the relay-apps example code or klog, but
-you'll have to implement communication between userspace and kernel,
-allowing both to convey the state of buffers (full, empty, amount of
-padding).  The read() interface both removes padding and internally
-consumes the read sub-buffers; thus in cases where read(2) is being
-used to drain the channel buffers, special-purpose communication
-between kernel and user isn't necessary for basic operation.  Things
-such as buffer-full conditions would still need to be communicated via
-some channel though.
-
-klog and the relay-apps examples can be found in the relay-apps
-tarball on http://relayfs.sourceforge.net
-
-The relay interface user space API
-==================================
-
-The relay interface implements basic file operations for user space
-access to relay channel buffer data.  Here are the file operations
-that are available and some comments regarding their behavior:
-
-open()	    enables user to open an _existing_ channel buffer.
-
-mmap()      results in channel buffer being mapped into the caller's
-	    memory space. Note that you can't do a partial mmap - you
-	    must map the entire file, which is NRBUF * SUBBUFSIZE.
-
-read()      read the contents of a channel buffer.  The bytes read are
-	    'consumed' by the reader, i.e. they won't be available
-	    again to subsequent reads.  If the channel is being used
-	    in no-overwrite mode (the default), it can be read at any
-	    time even if there's an active kernel writer.  If the
-	    channel is being used in overwrite mode and there are
-	    active channel writers, results may be unpredictable -
-	    users should make sure that all logging to the channel has
-	    ended before using read() with overwrite mode.  Sub-buffer
-	    padding is automatically removed and will not be seen by
-	    the reader.
-
-sendfile()  transfer data from a channel buffer to an output file
-	    descriptor. Sub-buffer padding is automatically removed
-	    and will not be seen by the reader.
-
-poll()      POLLIN/POLLRDNORM/POLLERR supported.  User applications are
-	    notified when sub-buffer boundaries are crossed.
-
-close()     decrements the channel buffer's refcount.  When the refcount
-	    reaches 0, i.e. when no process or kernel client has the
-	    buffer open, the channel buffer is freed.
-
-In order for a user application to make use of relay files, the
-host filesystem must be mounted.  For example,
-
-	mount -t debugfs debugfs /sys/kernel/debug
-
-NOTE:   the host filesystem doesn't need to be mounted for kernel
-	clients to create or use channels - it only needs to be
-	mounted when user space applications need access to the buffer
-	data.
-
-
-The relay interface kernel API
-==============================
-
-Here's a summary of the API the relay interface provides to in-kernel clients:
-
-TBD(curr. line MT:/API/)
-  channel management functions:
-
-    relay_open(base_filename, parent, subbuf_size, n_subbufs,
-               callbacks, private_data)
-    relay_close(chan)
-    relay_flush(chan)
-    relay_reset(chan)
-
-  channel management typically called on instigation of userspace:
-
-    relay_subbufs_consumed(chan, cpu, subbufs_consumed)
-
-  write functions:
-
-    relay_write(chan, data, length)
-    __relay_write(chan, data, length)
-    relay_reserve(chan, length)
-
-  callbacks:
-
-    subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
-    buf_mapped(buf, filp)
-    buf_unmapped(buf, filp)
-    create_buf_file(filename, parent, mode, buf, is_global)
-    remove_buf_file(dentry)
-
-  helper functions:
-
-    relay_buf_full(buf)
-    subbuf_start_reserve(buf, length)
-
-
-Creating a channel
-------------------
-
-relay_open() is used to create a channel, along with its per-cpu
-channel buffers.  Each channel buffer will have an associated file
-created for it in the host filesystem, which can be and mmapped or
-read from in user space.  The files are named basename0...basenameN-1
-where N is the number of online cpus, and by default will be created
-in the root of the filesystem (if the parent param is NULL).  If you
-want a directory structure to contain your relay files, you should
-create it using the host filesystem's directory creation function,
-e.g. debugfs_create_dir(), and pass the parent directory to
-relay_open().  Users are responsible for cleaning up any directory
-structure they create, when the channel is closed - again the host
-filesystem's directory removal functions should be used for that,
-e.g. debugfs_remove().
-
-In order for a channel to be created and the host filesystem's files
-associated with its channel buffers, the user must provide definitions
-for two callback functions, create_buf_file() and remove_buf_file().
-create_buf_file() is called once for each per-cpu buffer from
-relay_open() and allows the user to create the file which will be used
-to represent the corresponding channel buffer.  The callback should
-return the dentry of the file created to represent the channel buffer.
-remove_buf_file() must also be defined; it's responsible for deleting
-the file(s) created in create_buf_file() and is called during
-relay_close().
-
-Here are some typical definitions for these callbacks, in this case
-using debugfs:
-
-/*
- * create_buf_file() callback.  Creates relay file in debugfs.
- */
-static struct dentry *create_buf_file_handler(const char *filename,
-                                              struct dentry *parent,
-                                              umode_t mode,
-                                              struct rchan_buf *buf,
-                                              int *is_global)
-{
-        return debugfs_create_file(filename, mode, parent, buf,
-	                           &relay_file_operations);
-}
-
-/*
- * remove_buf_file() callback.  Removes relay file from debugfs.
- */
-static int remove_buf_file_handler(struct dentry *dentry)
-{
-        debugfs_remove(dentry);
-
-        return 0;
-}
-
-/*
- * relay interface callbacks
- */
-static struct rchan_callbacks relay_callbacks =
-{
-        .create_buf_file = create_buf_file_handler,
-        .remove_buf_file = remove_buf_file_handler,
-};
-
-And an example relay_open() invocation using them:
-
-  chan = relay_open("cpu", NULL, SUBBUF_SIZE, N_SUBBUFS, &relay_callbacks, NULL);
-
-If the create_buf_file() callback fails, or isn't defined, channel
-creation and thus relay_open() will fail.
-
-The total size of each per-cpu buffer is calculated by multiplying the
-number of sub-buffers by the sub-buffer size passed into relay_open().
-The idea behind sub-buffers is that they're basically an extension of
-double-buffering to N buffers, and they also allow applications to
-easily implement random-access-on-buffer-boundary schemes, which can
-be important for some high-volume applications.  The number and size
-of sub-buffers is completely dependent on the application and even for
-the same application, different conditions will warrant different
-values for these parameters at different times.  Typically, the right
-values to use are best decided after some experimentation; in general,
-though, it's safe to assume that having only 1 sub-buffer is a bad
-idea - you're guaranteed to either overwrite data or lose events
-depending on the channel mode being used.
-
-The create_buf_file() implementation can also be defined in such a way
-as to allow the creation of a single 'global' buffer instead of the
-default per-cpu set.  This can be useful for applications interested
-mainly in seeing the relative ordering of system-wide events without
-the need to bother with saving explicit timestamps for the purpose of
-merging/sorting per-cpu files in a postprocessing step.
-
-To have relay_open() create a global buffer, the create_buf_file()
-implementation should set the value of the is_global outparam to a
-non-zero value in addition to creating the file that will be used to
-represent the single buffer.  In the case of a global buffer,
-create_buf_file() and remove_buf_file() will be called only once.  The
-normal channel-writing functions, e.g. relay_write(), can still be
-used - writes from any cpu will transparently end up in the global
-buffer - but since it is a global buffer, callers should make sure
-they use the proper locking for such a buffer, either by wrapping
-writes in a spinlock, or by copying a write function from relay.h and
-creating a local version that internally does the proper locking.
-
-The private_data passed into relay_open() allows clients to associate
-user-defined data with a channel, and is immediately available
-(including in create_buf_file()) via chan->private_data or
-buf->chan->private_data.
-
-Buffer-only channels
---------------------
-
-These channels have no files associated and can be created with
-relay_open(NULL, NULL, ...). Such channels are useful in scenarios such
-as when doing early tracing in the kernel, before the VFS is up. In these
-cases, one may open a buffer-only channel and then call
-relay_late_setup_files() when the kernel is ready to handle files,
-to expose the buffered data to the userspace.
-
-Channel 'modes'
----------------
-
-relay channels can be used in either of two modes - 'overwrite' or
-'no-overwrite'.  The mode is entirely determined by the implementation
-of the subbuf_start() callback, as described below.  The default if no
-subbuf_start() callback is defined is 'no-overwrite' mode.  If the
-default mode suits your needs, and you plan to use the read()
-interface to retrieve channel data, you can ignore the details of this
-section, as it pertains mainly to mmap() implementations.
-
-In 'overwrite' mode, also known as 'flight recorder' mode, writes
-continuously cycle around the buffer and will never fail, but will
-unconditionally overwrite old data regardless of whether it's actually
-been consumed.  In no-overwrite mode, writes will fail, i.e. data will
-be lost, if the number of unconsumed sub-buffers equals the total
-number of sub-buffers in the channel.  It should be clear that if
-there is no consumer or if the consumer can't consume sub-buffers fast
-enough, data will be lost in either case; the only difference is
-whether data is lost from the beginning or the end of a buffer.
-
-As explained above, a relay channel is made of up one or more
-per-cpu channel buffers, each implemented as a circular buffer
-subdivided into one or more sub-buffers.  Messages are written into
-the current sub-buffer of the channel's current per-cpu buffer via the
-write functions described below.  Whenever a message can't fit into
-the current sub-buffer, because there's no room left for it, the
-client is notified via the subbuf_start() callback that a switch to a
-new sub-buffer is about to occur.  The client uses this callback to 1)
-initialize the next sub-buffer if appropriate 2) finalize the previous
-sub-buffer if appropriate and 3) return a boolean value indicating
-whether or not to actually move on to the next sub-buffer.
-
-To implement 'no-overwrite' mode, the userspace client would provide
-an implementation of the subbuf_start() callback something like the
-following:
-
-static int subbuf_start(struct rchan_buf *buf,
-                        void *subbuf,
-			void *prev_subbuf,
-			unsigned int prev_padding)
-{
-	if (prev_subbuf)
-		*((unsigned *)prev_subbuf) = prev_padding;
-
-	if (relay_buf_full(buf))
-		return 0;
-
-	subbuf_start_reserve(buf, sizeof(unsigned int));
-
-	return 1;
-}
-
-If the current buffer is full, i.e. all sub-buffers remain unconsumed,
-the callback returns 0 to indicate that the buffer switch should not
-occur yet, i.e. until the consumer has had a chance to read the
-current set of ready sub-buffers.  For the relay_buf_full() function
-to make sense, the consumer is responsible for notifying the relay
-interface when sub-buffers have been consumed via
-relay_subbufs_consumed().  Any subsequent attempts to write into the
-buffer will again invoke the subbuf_start() callback with the same
-parameters; only when the consumer has consumed one or more of the
-ready sub-buffers will relay_buf_full() return 0, in which case the
-buffer switch can continue.
-
-The implementation of the subbuf_start() callback for 'overwrite' mode
-would be very similar:
-
-static int subbuf_start(struct rchan_buf *buf,
-                        void *subbuf,
-			void *prev_subbuf,
-			size_t prev_padding)
-{
-	if (prev_subbuf)
-		*((unsigned *)prev_subbuf) = prev_padding;
-
-	subbuf_start_reserve(buf, sizeof(unsigned int));
-
-	return 1;
-}
-
-In this case, the relay_buf_full() check is meaningless and the
-callback always returns 1, causing the buffer switch to occur
-unconditionally.  It's also meaningless for the client to use the
-relay_subbufs_consumed() function in this mode, as it's never
-consulted.
-
-The default subbuf_start() implementation, used if the client doesn't
-define any callbacks, or doesn't define the subbuf_start() callback,
-implements the simplest possible 'no-overwrite' mode, i.e. it does
-nothing but return 0.
-
-Header information can be reserved at the beginning of each sub-buffer
-by calling the subbuf_start_reserve() helper function from within the
-subbuf_start() callback.  This reserved area can be used to store
-whatever information the client wants.  In the example above, room is
-reserved in each sub-buffer to store the padding count for that
-sub-buffer.  This is filled in for the previous sub-buffer in the
-subbuf_start() implementation; the padding value for the previous
-sub-buffer is passed into the subbuf_start() callback along with a
-pointer to the previous sub-buffer, since the padding value isn't
-known until a sub-buffer is filled.  The subbuf_start() callback is
-also called for the first sub-buffer when the channel is opened, to
-give the client a chance to reserve space in it.  In this case the
-previous sub-buffer pointer passed into the callback will be NULL, so
-the client should check the value of the prev_subbuf pointer before
-writing into the previous sub-buffer.
-
-Writing to a channel
---------------------
-
-Kernel clients write data into the current cpu's channel buffer using
-relay_write() or __relay_write().  relay_write() is the main logging
-function - it uses local_irqsave() to protect the buffer and should be
-used if you might be logging from interrupt context.  If you know
-you'll never be logging from interrupt context, you can use
-__relay_write(), which only disables preemption.  These functions
-don't return a value, so you can't determine whether or not they
-failed - the assumption is that you wouldn't want to check a return
-value in the fast logging path anyway, and that they'll always succeed
-unless the buffer is full and no-overwrite mode is being used, in
-which case you can detect a failed write in the subbuf_start()
-callback by calling the relay_buf_full() helper function.
-
-relay_reserve() is used to reserve a slot in a channel buffer which
-can be written to later.  This would typically be used in applications
-that need to write directly into a channel buffer without having to
-stage data in a temporary buffer beforehand.  Because the actual write
-may not happen immediately after the slot is reserved, applications
-using relay_reserve() can keep a count of the number of bytes actually
-written, either in space reserved in the sub-buffers themselves or as
-a separate array.  See the 'reserve' example in the relay-apps tarball
-at http://relayfs.sourceforge.net for an example of how this can be
-done.  Because the write is under control of the client and is
-separated from the reserve, relay_reserve() doesn't protect the buffer
-at all - it's up to the client to provide the appropriate
-synchronization when using relay_reserve().
-
-Closing a channel
------------------
-
-The client calls relay_close() when it's finished using the channel.
-The channel and its associated buffers are destroyed when there are no
-longer any references to any of the channel buffers.  relay_flush()
-forces a sub-buffer switch on all the channel buffers, and can be used
-to finalize and process the last sub-buffers before the channel is
-closed.
-
-Misc
-----
-
-Some applications may want to keep a channel around and re-use it
-rather than open and close a new channel for each use.  relay_reset()
-can be used for this purpose - it resets a channel to its initial
-state without reallocating channel buffer memory or destroying
-existing mappings.  It should however only be called when it's safe to
-do so, i.e. when the channel isn't currently being written to.
-
-Finally, there are a couple of utility callbacks that can be used for
-different purposes.  buf_mapped() is called whenever a channel buffer
-is mmapped from user space and buf_unmapped() is called when it's
-unmapped.  The client can use this notification to trigger actions
-within the kernel application, such as enabling/disabling logging to
-the channel.
-
-
-Resources
-=========
-
-For news, example code, mailing list, etc. see the relay interface homepage:
-
-    http://relayfs.sourceforge.net
-
-
-Credits
-=======
-
-The ideas and specs for the relay interface came about as a result of
-discussions on tracing involving the following:
-
-Michel Dagenais		<michel.dagenais@polymtl.ca>
-Richard Moore		<richardj_moore@uk.ibm.com>
-Bob Wisniewski		<bob@watson.ibm.com>
-Karim Yaghmour		<karim@opersys.com>
-Tom Zanussi		<zanussi@us.ibm.com>
-
-Also thanks to Hubertus Franke for a lot of useful suggestions and bug
-reports.
-- 
cgit 


From 6db0a480aa07ab65b6c7d34d095c714359af3e87 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:22 +0100
Subject: docs: filesystems: convert romfs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/d2cc83e7cd6de63c793ccd3f2588ea40f7f1e764.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/romfs.rst | 194 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/romfs.txt | 186 ----------------------------------
 3 files changed, 195 insertions(+), 186 deletions(-)
 create mode 100644 Documentation/filesystems/romfs.rst
 delete mode 100644 Documentation/filesystems/romfs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 0aade8146d4d..3b26639517af 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -85,5 +85,6 @@ Documentation for filesystem implementations.
    qnx6
    ramfs-rootfs-initramfs
    relay
+   romfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/romfs.rst b/Documentation/filesystems/romfs.rst
new file mode 100644
index 000000000000..465b11efa9be
--- /dev/null
+++ b/Documentation/filesystems/romfs.rst
@@ -0,0 +1,194 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+ROMFS - ROM File System
+=======================
+
+This is a quite dumb, read only filesystem, mainly for initial RAM
+disks of installation disks.  It has grown up by the need of having
+modules linked at boot time.  Using this filesystem, you get a very
+similar feature, and even the possibility of a small kernel, with a
+file system which doesn't take up useful memory from the router
+functions in the basement of your office.
+
+For comparison, both the older minix and xiafs (the latter is now
+defunct) filesystems, compiled as module need more than 20000 bytes,
+while romfs is less than a page, about 4000 bytes (assuming i586
+code).  Under the same conditions, the msdos filesystem would need
+about 30K (and does not support device nodes or symlinks), while the
+nfs module with nfsroot is about 57K.  Furthermore, as a bit unfair
+comparison, an actual rescue disk used up 3202 blocks with ext2, while
+with romfs, it needed 3079 blocks.
+
+To create such a file system, you'll need a user program named
+genromfs. It is available on http://romfs.sourceforge.net/
+
+As the name suggests, romfs could be also used (space-efficiently) on
+various read-only media, like (E)EPROM disks if someone will have the
+motivation.. :)
+
+However, the main purpose of romfs is to have a very small kernel,
+which has only this filesystem linked in, and then can load any module
+later, with the current module utilities.  It can also be used to run
+some program to decide if you need SCSI devices, and even IDE or
+floppy drives can be loaded later if you use the "initrd"--initial
+RAM disk--feature of the kernel.  This would not be really news
+flash, but with romfs, you can even spare off your ext2 or minix or
+maybe even affs filesystem until you really know that you need it.
+
+For example, a distribution boot disk can contain only the cd disk
+drivers (and possibly the SCSI drivers), and the ISO 9660 filesystem
+module.  The kernel can be small enough, since it doesn't have other
+filesystems, like the quite large ext2fs module, which can then be
+loaded off the CD at a later stage of the installation.  Another use
+would be for a recovery disk, when you are reinstalling a workstation
+from the network, and you will have all the tools/modules available
+from a nearby server, so you don't want to carry two disks for this
+purpose, just because it won't fit into ext2.
+
+romfs operates on block devices as you can expect, and the underlying
+structure is very simple.  Every accessible structure begins on 16
+byte boundaries for fast access.  The minimum space a file will take
+is 32 bytes (this is an empty file, with a less than 16 character
+name).  The maximum overhead for any non-empty file is the header, and
+the 16 byte padding for the name and the contents, also 16+14+15 = 45
+bytes.  This is quite rare however, since most file names are longer
+than 3 bytes, and shorter than 15 bytes.
+
+The layout of the filesystem is the following::
+
+ offset	    content
+
+	+---+---+---+---+
+  0	| - | r | o | m |  \
+	+---+---+---+---+	The ASCII representation of those bytes
+  4	| 1 | f | s | - |  /	(i.e. "-rom1fs-")
+	+---+---+---+---+
+  8	|   full size	|	The number of accessible bytes in this fs.
+	+---+---+---+---+
+ 12	|    checksum	|	The checksum of the FIRST 512 BYTES.
+	+---+---+---+---+
+ 16	| volume name	|	The zero terminated name of the volume,
+	:               :	padded to 16 byte boundary.
+	+---+---+---+---+
+ xx	|     file	|
+	:    headers	:
+
+Every multi byte value (32 bit words, I'll use the longwords term from
+now on) must be in big endian order.
+
+The first eight bytes identify the filesystem, even for the casual
+inspector.  After that, in the 3rd longword, it contains the number of
+bytes accessible from the start of this filesystem.  The 4th longword
+is the checksum of the first 512 bytes (or the number of bytes
+accessible, whichever is smaller).  The applied algorithm is the same
+as in the AFFS filesystem, namely a simple sum of the longwords
+(assuming bigendian quantities again).  For details, please consult
+the source.  This algorithm was chosen because although it's not quite
+reliable, it does not require any tables, and it is very simple.
+
+The following bytes are now part of the file system; each file header
+must begin on a 16 byte boundary::
+
+ offset	    content
+
+     	+---+---+---+---+
+  0	| next filehdr|X|	The offset of the next file header
+	+---+---+---+---+	  (zero if no more files)
+  4	|   spec.info	|	Info for directories/hard links/devices
+	+---+---+---+---+
+  8	|     size      |	The size of this file in bytes
+	+---+---+---+---+
+ 12	|   checksum	|	Covering the meta data, including the file
+	+---+---+---+---+	  name, and padding
+ 16	| file name     |	The zero terminated name of the file,
+	:               :	padded to 16 byte boundary
+	+---+---+---+---+
+ xx	| file data	|
+	:		:
+
+Since the file headers begin always at a 16 byte boundary, the lowest
+4 bits would be always zero in the next filehdr pointer.  These four
+bits are used for the mode information.  Bits 0..2 specify the type of
+the file; while bit 4 shows if the file is executable or not.  The
+permissions are assumed to be world readable, if this bit is not set,
+and world executable if it is; except the character and block devices,
+they are never accessible for other than owner.  The owner of every
+file is user and group 0, this should never be a problem for the
+intended use.  The mapping of the 8 possible values to file types is
+the following:
+
+==	=============== ============================================
+	  mapping		spec.info means
+==	=============== ============================================
+ 0	hard link	link destination [file header]
+ 1	directory	first file's header
+ 2	regular file	unused, must be zero [MBZ]
+ 3	symbolic link	unused, MBZ (file data is the link content)
+ 4	block device	16/16 bits major/minor number
+ 5	char device		    - " -
+ 6	socket		unused, MBZ
+ 7	fifo		unused, MBZ
+==	=============== ============================================
+
+Note that hard links are specifically marked in this filesystem, but
+they will behave as you can expect (i.e. share the inode number).
+Note also that it is your responsibility to not create hard link
+loops, and creating all the . and .. links for directories.  This is
+normally done correctly by the genromfs program.  Please refrain from
+using the executable bits for special purposes on the socket and fifo
+special files, they may have other uses in the future.  Additionally,
+please remember that only regular files, and symlinks are supposed to
+have a nonzero size field; they contain the number of bytes available
+directly after the (padded) file name.
+
+Another thing to note is that romfs works on file headers and data
+aligned to 16 byte boundaries, but most hardware devices and the block
+device drivers are unable to cope with smaller than block-sized data.
+To overcome this limitation, the whole size of the file system must be
+padded to an 1024 byte boundary.
+
+If you have any problems or suggestions concerning this file system,
+please contact me.  However, think twice before wanting me to add
+features and code, because the primary and most important advantage of
+this file system is the small code.  On the other hand, don't be
+alarmed, I'm not getting that much romfs related mail.  Now I can
+understand why Avery wrote poems in the ARCnet docs to get some more
+feedback. :)
+
+romfs has also a mailing list, and to date, it hasn't received any
+traffic, so you are welcome to join it to discuss your ideas. :)
+
+It's run by ezmlm, so you can subscribe to it by sending a message
+to romfs-subscribe@shadow.banki.hu, the content is irrelevant.
+
+Pending issues:
+
+- Permissions and owner information are pretty essential features of a
+  Un*x like system, but romfs does not provide the full possibilities.
+  I have never found this limiting, but others might.
+
+- The file system is read only, so it can be very small, but in case
+  one would want to write _anything_ to a file system, he still needs
+  a writable file system, thus negating the size advantages.  Possible
+  solutions: implement write access as a compile-time option, or a new,
+  similarly small writable filesystem for RAM disks.
+
+- Since the files are only required to have alignment on a 16 byte
+  boundary, it is currently possibly suboptimal to read or execute files
+  from the filesystem.  It might be resolved by reordering file data to
+  have most of it (i.e. except the start and the end) laying at "natural"
+  boundaries, thus it would be possible to directly map a big portion of
+  the file contents to the mm subsystem.
+
+- Compression might be an useful feature, but memory is quite a
+  limiting factor in my eyes.
+
+- Where it is used?
+
+- Does it work on other architectures than intel and motorola?
+
+
+Have fun,
+
+Janos Farkas <chexum@shadow.banki.hu>
diff --git a/Documentation/filesystems/romfs.txt b/Documentation/filesystems/romfs.txt
deleted file mode 100644
index e2b07cc9120a..000000000000
--- a/Documentation/filesystems/romfs.txt
+++ /dev/null
@@ -1,186 +0,0 @@
-ROMFS - ROM FILE SYSTEM
-
-This is a quite dumb, read only filesystem, mainly for initial RAM
-disks of installation disks.  It has grown up by the need of having
-modules linked at boot time.  Using this filesystem, you get a very
-similar feature, and even the possibility of a small kernel, with a
-file system which doesn't take up useful memory from the router
-functions in the basement of your office.
-
-For comparison, both the older minix and xiafs (the latter is now
-defunct) filesystems, compiled as module need more than 20000 bytes,
-while romfs is less than a page, about 4000 bytes (assuming i586
-code).  Under the same conditions, the msdos filesystem would need
-about 30K (and does not support device nodes or symlinks), while the
-nfs module with nfsroot is about 57K.  Furthermore, as a bit unfair
-comparison, an actual rescue disk used up 3202 blocks with ext2, while
-with romfs, it needed 3079 blocks.
-
-To create such a file system, you'll need a user program named
-genromfs. It is available on http://romfs.sourceforge.net/
-
-As the name suggests, romfs could be also used (space-efficiently) on
-various read-only media, like (E)EPROM disks if someone will have the
-motivation.. :)
-
-However, the main purpose of romfs is to have a very small kernel,
-which has only this filesystem linked in, and then can load any module
-later, with the current module utilities.  It can also be used to run
-some program to decide if you need SCSI devices, and even IDE or
-floppy drives can be loaded later if you use the "initrd"--initial
-RAM disk--feature of the kernel.  This would not be really news
-flash, but with romfs, you can even spare off your ext2 or minix or
-maybe even affs filesystem until you really know that you need it.
-
-For example, a distribution boot disk can contain only the cd disk
-drivers (and possibly the SCSI drivers), and the ISO 9660 filesystem
-module.  The kernel can be small enough, since it doesn't have other
-filesystems, like the quite large ext2fs module, which can then be
-loaded off the CD at a later stage of the installation.  Another use
-would be for a recovery disk, when you are reinstalling a workstation
-from the network, and you will have all the tools/modules available
-from a nearby server, so you don't want to carry two disks for this
-purpose, just because it won't fit into ext2.
-
-romfs operates on block devices as you can expect, and the underlying
-structure is very simple.  Every accessible structure begins on 16
-byte boundaries for fast access.  The minimum space a file will take
-is 32 bytes (this is an empty file, with a less than 16 character
-name).  The maximum overhead for any non-empty file is the header, and
-the 16 byte padding for the name and the contents, also 16+14+15 = 45
-bytes.  This is quite rare however, since most file names are longer
-than 3 bytes, and shorter than 15 bytes.
-
-The layout of the filesystem is the following:
-
-offset	    content
-
-	+---+---+---+---+
-  0	| - | r | o | m |  \
-	+---+---+---+---+	The ASCII representation of those bytes
-  4	| 1 | f | s | - |  /	(i.e. "-rom1fs-")
-	+---+---+---+---+
-  8	|   full size	|	The number of accessible bytes in this fs.
-	+---+---+---+---+
- 12	|    checksum	|	The checksum of the FIRST 512 BYTES.
-	+---+---+---+---+
- 16	| volume name	|	The zero terminated name of the volume,
-	:               :	padded to 16 byte boundary.
-	+---+---+---+---+
- xx	|     file	|
-	:    headers	:
-
-Every multi byte value (32 bit words, I'll use the longwords term from
-now on) must be in big endian order.
-
-The first eight bytes identify the filesystem, even for the casual
-inspector.  After that, in the 3rd longword, it contains the number of
-bytes accessible from the start of this filesystem.  The 4th longword
-is the checksum of the first 512 bytes (or the number of bytes
-accessible, whichever is smaller).  The applied algorithm is the same
-as in the AFFS filesystem, namely a simple sum of the longwords
-(assuming bigendian quantities again).  For details, please consult
-the source.  This algorithm was chosen because although it's not quite
-reliable, it does not require any tables, and it is very simple.
-
-The following bytes are now part of the file system; each file header
-must begin on a 16 byte boundary.
-
-offset	    content
-
-     	+---+---+---+---+
-  0	| next filehdr|X|	The offset of the next file header
-	+---+---+---+---+	  (zero if no more files)
-  4	|   spec.info	|	Info for directories/hard links/devices
-	+---+---+---+---+
-  8	|     size      |	The size of this file in bytes
-	+---+---+---+---+
- 12	|   checksum	|	Covering the meta data, including the file
-	+---+---+---+---+	  name, and padding
- 16	| file name     |	The zero terminated name of the file,
-	:               :	padded to 16 byte boundary
-	+---+---+---+---+
- xx	| file data	|
-	:		:
-
-Since the file headers begin always at a 16 byte boundary, the lowest
-4 bits would be always zero in the next filehdr pointer.  These four
-bits are used for the mode information.  Bits 0..2 specify the type of
-the file; while bit 4 shows if the file is executable or not.  The
-permissions are assumed to be world readable, if this bit is not set,
-and world executable if it is; except the character and block devices,
-they are never accessible for other than owner.  The owner of every
-file is user and group 0, this should never be a problem for the
-intended use.  The mapping of the 8 possible values to file types is
-the following:
-
-	  mapping		spec.info means
- 0	hard link	link destination [file header]
- 1	directory	first file's header
- 2	regular file	unused, must be zero [MBZ]
- 3	symbolic link	unused, MBZ (file data is the link content)
- 4	block device	16/16 bits major/minor number
- 5	char device		    - " -
- 6	socket		unused, MBZ
- 7	fifo		unused, MBZ
-
-Note that hard links are specifically marked in this filesystem, but
-they will behave as you can expect (i.e. share the inode number).
-Note also that it is your responsibility to not create hard link
-loops, and creating all the . and .. links for directories.  This is
-normally done correctly by the genromfs program.  Please refrain from
-using the executable bits for special purposes on the socket and fifo
-special files, they may have other uses in the future.  Additionally,
-please remember that only regular files, and symlinks are supposed to
-have a nonzero size field; they contain the number of bytes available
-directly after the (padded) file name.
-
-Another thing to note is that romfs works on file headers and data
-aligned to 16 byte boundaries, but most hardware devices and the block
-device drivers are unable to cope with smaller than block-sized data.
-To overcome this limitation, the whole size of the file system must be
-padded to an 1024 byte boundary.
-
-If you have any problems or suggestions concerning this file system,
-please contact me.  However, think twice before wanting me to add
-features and code, because the primary and most important advantage of
-this file system is the small code.  On the other hand, don't be
-alarmed, I'm not getting that much romfs related mail.  Now I can
-understand why Avery wrote poems in the ARCnet docs to get some more
-feedback. :)
-
-romfs has also a mailing list, and to date, it hasn't received any
-traffic, so you are welcome to join it to discuss your ideas. :)
-
-It's run by ezmlm, so you can subscribe to it by sending a message
-to romfs-subscribe@shadow.banki.hu, the content is irrelevant.
-
-Pending issues:
-
-- Permissions and owner information are pretty essential features of a
-Un*x like system, but romfs does not provide the full possibilities.
-I have never found this limiting, but others might.
-
-- The file system is read only, so it can be very small, but in case
-one would want to write _anything_ to a file system, he still needs
-a writable file system, thus negating the size advantages.  Possible
-solutions: implement write access as a compile-time option, or a new,
-similarly small writable filesystem for RAM disks.
-
-- Since the files are only required to have alignment on a 16 byte
-boundary, it is currently possibly suboptimal to read or execute files
-from the filesystem.  It might be resolved by reordering file data to
-have most of it (i.e. except the start and the end) laying at "natural"
-boundaries, thus it would be possible to directly map a big portion of
-the file contents to the mm subsystem.
-
-- Compression might be an useful feature, but memory is quite a
-limiting factor in my eyes.
-
-- Where it is used?
-
-- Does it work on other architectures than intel and motorola?
-
-
-Have fun,
-Janos Farkas <chexum@shadow.banki.hu>
-- 
cgit 


From 31771f45c8e46d9356f1a58329f5cd40ab331e1a Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:23 +0100
Subject: docs: filesystems: convert squashfs.txt to ReST

- Add a SPDX header;
- Adjust document and section titles;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/cec30862c7ee7de7f9cd903e35e6c8bf74cc928a.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst    |   1 +
 Documentation/filesystems/squashfs.rst | 265 +++++++++++++++++++++++++++++++++
 Documentation/filesystems/squashfs.txt | 259 --------------------------------
 3 files changed, 266 insertions(+), 259 deletions(-)
 create mode 100644 Documentation/filesystems/squashfs.rst
 delete mode 100644 Documentation/filesystems/squashfs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 3b26639517af..97a5f65ae509 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -86,5 +86,6 @@ Documentation for filesystem implementations.
    ramfs-rootfs-initramfs
    relay
    romfs
+   squashfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/squashfs.rst b/Documentation/filesystems/squashfs.rst
new file mode 100644
index 000000000000..df42106bae71
--- /dev/null
+++ b/Documentation/filesystems/squashfs.rst
@@ -0,0 +1,265 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+Squashfs 4.0 Filesystem
+=======================
+
+Squashfs is a compressed read-only filesystem for Linux.
+
+It uses zlib, lz4, lzo, or xz compression to compress files, inodes and
+directories.  Inodes in the system are very small and all blocks are packed to
+minimise data overhead. Block sizes greater than 4K are supported up to a
+maximum of 1Mbytes (default block size 128K).
+
+Squashfs is intended for general read-only filesystem use, for archival
+use (i.e. in cases where a .tar.gz file may be used), and in constrained
+block device/memory systems (e.g. embedded systems) where low overhead is
+needed.
+
+Mailing list: squashfs-devel@lists.sourceforge.net
+Web site: www.squashfs.org
+
+1. Filesystem Features
+----------------------
+
+Squashfs filesystem features versus Cramfs:
+
+============================== 	=========		==========
+				Squashfs		Cramfs
+============================== 	=========		==========
+Max filesystem size		2^64			256 MiB
+Max file size			~ 2 TiB			16 MiB
+Max files			unlimited		unlimited
+Max directories			unlimited		unlimited
+Max entries per directory	unlimited		unlimited
+Max block size			1 MiB			4 KiB
+Metadata compression		yes			no
+Directory indexes		yes			no
+Sparse file support		yes			no
+Tail-end packing (fragments)	yes			no
+Exportable (NFS etc.)		yes			no
+Hard link support		yes			no
+"." and ".." in readdir		yes			no
+Real inode numbers		yes			no
+32-bit uids/gids		yes			no
+File creation time		yes			no
+Xattr support			yes			no
+ACL support			no			no
+============================== 	=========		==========
+
+Squashfs compresses data, inodes and directories.  In addition, inode and
+directory data are highly compacted, and packed on byte boundaries.  Each
+compressed inode is on average 8 bytes in length (the exact length varies on
+file type, i.e. regular file, directory, symbolic link, and block/char device
+inodes have different sizes).
+
+2. Using Squashfs
+-----------------
+
+As squashfs is a read-only filesystem, the mksquashfs program must be used to
+create populated squashfs filesystems.  This and other squashfs utilities
+can be obtained from http://www.squashfs.org.  Usage instructions can be
+obtained from this site also.
+
+The squashfs-tools development tree is now located on kernel.org
+	git://git.kernel.org/pub/scm/fs/squashfs/squashfs-tools.git
+
+3. Squashfs Filesystem Design
+-----------------------------
+
+A squashfs filesystem consists of a maximum of nine parts, packed together on a
+byte alignment::
+
+	 ---------------
+	|  superblock 	|
+	|---------------|
+	|  compression  |
+	|    options    |
+	|---------------|
+	|  datablocks   |
+	|  & fragments  |
+	|---------------|
+	|  inode table	|
+	|---------------|
+	|   directory	|
+	|     table     |
+	|---------------|
+	|   fragment	|
+	|    table      |
+	|---------------|
+	|    export     |
+	|    table      |
+	|---------------|
+	|    uid/gid	|
+	|  lookup table	|
+	|---------------|
+	|     xattr     |
+	|     table	|
+	 ---------------
+
+Compressed data blocks are written to the filesystem as files are read from
+the source directory, and checked for duplicates.  Once all file data has been
+written the completed inode, directory, fragment, export, uid/gid lookup and
+xattr tables are written.
+
+3.1 Compression options
+-----------------------
+
+Compressors can optionally support compression specific options (e.g.
+dictionary size).  If non-default compression options have been used, then
+these are stored here.
+
+3.2 Inodes
+----------
+
+Metadata (inodes and directories) are compressed in 8Kbyte blocks.  Each
+compressed block is prefixed by a two byte length, the top bit is set if the
+block is uncompressed.  A block will be uncompressed if the -noI option is set,
+or if the compressed block was larger than the uncompressed block.
+
+Inodes are packed into the metadata blocks, and are not aligned to block
+boundaries, therefore inodes overlap compressed blocks.  Inodes are identified
+by a 48-bit number which encodes the location of the compressed metadata block
+containing the inode, and the byte offset into that block where the inode is
+placed (<block, offset>).
+
+To maximise compression there are different inodes for each file type
+(regular file, directory, device, etc.), the inode contents and length
+varying with the type.
+
+To further maximise compression, two types of regular file inode and
+directory inode are defined: inodes optimised for frequently occurring
+regular files and directories, and extended types where extra
+information has to be stored.
+
+3.3 Directories
+---------------
+
+Like inodes, directories are packed into compressed metadata blocks, stored
+in a directory table.  Directories are accessed using the start address of
+the metablock containing the directory and the offset into the
+decompressed block (<block, offset>).
+
+Directories are organised in a slightly complex way, and are not simply
+a list of file names.  The organisation takes advantage of the
+fact that (in most cases) the inodes of the files will be in the same
+compressed metadata block, and therefore, can share the start block.
+Directories are therefore organised in a two level list, a directory
+header containing the shared start block value, and a sequence of directory
+entries, each of which share the shared start block.  A new directory header
+is written once/if the inode start block changes.  The directory
+header/directory entry list is repeated as many times as necessary.
+
+Directories are sorted, and can contain a directory index to speed up
+file lookup.  Directory indexes store one entry per metablock, each entry
+storing the index/filename mapping to the first directory header
+in each metadata block.  Directories are sorted in alphabetical order,
+and at lookup the index is scanned linearly looking for the first filename
+alphabetically larger than the filename being looked up.  At this point the
+location of the metadata block the filename is in has been found.
+The general idea of the index is to ensure only one metadata block needs to be
+decompressed to do a lookup irrespective of the length of the directory.
+This scheme has the advantage that it doesn't require extra memory overhead
+and doesn't require much extra storage on disk.
+
+3.4 File data
+-------------
+
+Regular files consist of a sequence of contiguous compressed blocks, and/or a
+compressed fragment block (tail-end packed block).   The compressed size
+of each datablock is stored in a block list contained within the
+file inode.
+
+To speed up access to datablocks when reading 'large' files (256 Mbytes or
+larger), the code implements an index cache that caches the mapping from
+block index to datablock location on disk.
+
+The index cache allows Squashfs to handle large files (up to 1.75 TiB) while
+retaining a simple and space-efficient block list on disk.  The cache
+is split into slots, caching up to eight 224 GiB files (128 KiB blocks).
+Larger files use multiple slots, with 1.75 TiB files using all 8 slots.
+The index cache is designed to be memory efficient, and by default uses
+16 KiB.
+
+3.5 Fragment lookup table
+-------------------------
+
+Regular files can contain a fragment index which is mapped to a fragment
+location on disk and compressed size using a fragment lookup table.  This
+fragment lookup table is itself stored compressed into metadata blocks.
+A second index table is used to locate these.  This second index table for
+speed of access (and because it is small) is read at mount time and cached
+in memory.
+
+3.6 Uid/gid lookup table
+------------------------
+
+For space efficiency regular files store uid and gid indexes, which are
+converted to 32-bit uids/gids using an id look up table.  This table is
+stored compressed into metadata blocks.  A second index table is used to
+locate these.  This second index table for speed of access (and because it
+is small) is read at mount time and cached in memory.
+
+3.7 Export table
+----------------
+
+To enable Squashfs filesystems to be exportable (via NFS etc.) filesystems
+can optionally (disabled with the -no-exports Mksquashfs option) contain
+an inode number to inode disk location lookup table.  This is required to
+enable Squashfs to map inode numbers passed in filehandles to the inode
+location on disk, which is necessary when the export code reinstantiates
+expired/flushed inodes.
+
+This table is stored compressed into metadata blocks.  A second index table is
+used to locate these.  This second index table for speed of access (and because
+it is small) is read at mount time and cached in memory.
+
+3.8 Xattr table
+---------------
+
+The xattr table contains extended attributes for each inode.  The xattrs
+for each inode are stored in a list, each list entry containing a type,
+name and value field.  The type field encodes the xattr prefix
+("user.", "trusted." etc) and it also encodes how the name/value fields
+should be interpreted.  Currently the type indicates whether the value
+is stored inline (in which case the value field contains the xattr value),
+or if it is stored out of line (in which case the value field stores a
+reference to where the actual value is stored).  This allows large values
+to be stored out of line improving scanning and lookup performance and it
+also allows values to be de-duplicated, the value being stored once, and
+all other occurrences holding an out of line reference to that value.
+
+The xattr lists are packed into compressed 8K metadata blocks.
+To reduce overhead in inodes, rather than storing the on-disk
+location of the xattr list inside each inode, a 32-bit xattr id
+is stored.  This xattr id is mapped into the location of the xattr
+list using a second xattr id lookup table.
+
+4. TODOs and Outstanding Issues
+-------------------------------
+
+4.1 TODO list
+-------------
+
+Implement ACL support.
+
+4.2 Squashfs Internal Cache
+---------------------------
+
+Blocks in Squashfs are compressed.  To avoid repeatedly decompressing
+recently accessed data Squashfs uses two small metadata and fragment caches.
+
+The cache is not used for file datablocks, these are decompressed and cached in
+the page-cache in the normal way.  The cache is used to temporarily cache
+fragment and metadata blocks which have been read as a result of a metadata
+(i.e. inode or directory) or fragment access.  Because metadata and fragments
+are packed together into blocks (to gain greater compression) the read of a
+particular piece of metadata or fragment will retrieve other metadata/fragments
+which have been packed with it, these because of locality-of-reference may be
+read in the near future. Temporarily caching them ensures they are available
+for near future access without requiring an additional read and decompress.
+
+In the future this internal cache may be replaced with an implementation which
+uses the kernel page cache.  Because the page cache operates on page sized
+units this may introduce additional complexity in terms of locking and
+associated race conditions.
diff --git a/Documentation/filesystems/squashfs.txt b/Documentation/filesystems/squashfs.txt
deleted file mode 100644
index e5274f84dc56..000000000000
--- a/Documentation/filesystems/squashfs.txt
+++ /dev/null
@@ -1,259 +0,0 @@
-SQUASHFS 4.0 FILESYSTEM
-=======================
-
-Squashfs is a compressed read-only filesystem for Linux.
-It uses zlib, lz4, lzo, or xz compression to compress files, inodes and
-directories.  Inodes in the system are very small and all blocks are packed to
-minimise data overhead. Block sizes greater than 4K are supported up to a
-maximum of 1Mbytes (default block size 128K).
-
-Squashfs is intended for general read-only filesystem use, for archival
-use (i.e. in cases where a .tar.gz file may be used), and in constrained
-block device/memory systems (e.g. embedded systems) where low overhead is
-needed.
-
-Mailing list: squashfs-devel@lists.sourceforge.net
-Web site: www.squashfs.org
-
-1. FILESYSTEM FEATURES
-----------------------
-
-Squashfs filesystem features versus Cramfs:
-
-				Squashfs		Cramfs
-
-Max filesystem size:		2^64			256 MiB
-Max file size:			~ 2 TiB			16 MiB
-Max files:			unlimited		unlimited
-Max directories:		unlimited		unlimited
-Max entries per directory:	unlimited		unlimited
-Max block size:			1 MiB			4 KiB
-Metadata compression:		yes			no
-Directory indexes:		yes			no
-Sparse file support:		yes			no
-Tail-end packing (fragments):	yes			no
-Exportable (NFS etc.):		yes			no
-Hard link support:		yes			no
-"." and ".." in readdir:	yes			no
-Real inode numbers:		yes			no
-32-bit uids/gids:		yes			no
-File creation time:		yes			no
-Xattr support:			yes			no
-ACL support:			no			no
-
-Squashfs compresses data, inodes and directories.  In addition, inode and
-directory data are highly compacted, and packed on byte boundaries.  Each
-compressed inode is on average 8 bytes in length (the exact length varies on
-file type, i.e. regular file, directory, symbolic link, and block/char device
-inodes have different sizes).
-
-2. USING SQUASHFS
------------------
-
-As squashfs is a read-only filesystem, the mksquashfs program must be used to
-create populated squashfs filesystems.  This and other squashfs utilities
-can be obtained from http://www.squashfs.org.  Usage instructions can be
-obtained from this site also.
-
-The squashfs-tools development tree is now located on kernel.org
-	git://git.kernel.org/pub/scm/fs/squashfs/squashfs-tools.git
-
-3. SQUASHFS FILESYSTEM DESIGN
------------------------------
-
-A squashfs filesystem consists of a maximum of nine parts, packed together on a
-byte alignment:
-
-	 ---------------
-	|  superblock 	|
-	|---------------|
-	|  compression  |
-	|    options    |
-	|---------------|
-	|  datablocks   |
-	|  & fragments  |
-	|---------------|
-	|  inode table	|
-	|---------------|
-	|   directory	|
-	|     table     |
-	|---------------|
-	|   fragment	|
-	|    table      |
-	|---------------|
-	|    export     |
-	|    table      |
-	|---------------|
-	|    uid/gid	|
-	|  lookup table	|
-	|---------------|
-	|     xattr     |
-	|     table	|
-	 ---------------
-
-Compressed data blocks are written to the filesystem as files are read from
-the source directory, and checked for duplicates.  Once all file data has been
-written the completed inode, directory, fragment, export, uid/gid lookup and
-xattr tables are written.
-
-3.1 Compression options
------------------------
-
-Compressors can optionally support compression specific options (e.g.
-dictionary size).  If non-default compression options have been used, then
-these are stored here.
-
-3.2 Inodes
-----------
-
-Metadata (inodes and directories) are compressed in 8Kbyte blocks.  Each
-compressed block is prefixed by a two byte length, the top bit is set if the
-block is uncompressed.  A block will be uncompressed if the -noI option is set,
-or if the compressed block was larger than the uncompressed block.
-
-Inodes are packed into the metadata blocks, and are not aligned to block
-boundaries, therefore inodes overlap compressed blocks.  Inodes are identified
-by a 48-bit number which encodes the location of the compressed metadata block
-containing the inode, and the byte offset into that block where the inode is
-placed (<block, offset>).
-
-To maximise compression there are different inodes for each file type
-(regular file, directory, device, etc.), the inode contents and length
-varying with the type.
-
-To further maximise compression, two types of regular file inode and
-directory inode are defined: inodes optimised for frequently occurring
-regular files and directories, and extended types where extra
-information has to be stored.
-
-3.3 Directories
----------------
-
-Like inodes, directories are packed into compressed metadata blocks, stored
-in a directory table.  Directories are accessed using the start address of
-the metablock containing the directory and the offset into the
-decompressed block (<block, offset>).
-
-Directories are organised in a slightly complex way, and are not simply
-a list of file names.  The organisation takes advantage of the
-fact that (in most cases) the inodes of the files will be in the same
-compressed metadata block, and therefore, can share the start block.
-Directories are therefore organised in a two level list, a directory
-header containing the shared start block value, and a sequence of directory
-entries, each of which share the shared start block.  A new directory header
-is written once/if the inode start block changes.  The directory
-header/directory entry list is repeated as many times as necessary.
-
-Directories are sorted, and can contain a directory index to speed up
-file lookup.  Directory indexes store one entry per metablock, each entry
-storing the index/filename mapping to the first directory header
-in each metadata block.  Directories are sorted in alphabetical order,
-and at lookup the index is scanned linearly looking for the first filename
-alphabetically larger than the filename being looked up.  At this point the
-location of the metadata block the filename is in has been found.
-The general idea of the index is to ensure only one metadata block needs to be
-decompressed to do a lookup irrespective of the length of the directory.
-This scheme has the advantage that it doesn't require extra memory overhead
-and doesn't require much extra storage on disk.
-
-3.4 File data
--------------
-
-Regular files consist of a sequence of contiguous compressed blocks, and/or a
-compressed fragment block (tail-end packed block).   The compressed size
-of each datablock is stored in a block list contained within the
-file inode.
-
-To speed up access to datablocks when reading 'large' files (256 Mbytes or
-larger), the code implements an index cache that caches the mapping from
-block index to datablock location on disk.
-
-The index cache allows Squashfs to handle large files (up to 1.75 TiB) while
-retaining a simple and space-efficient block list on disk.  The cache
-is split into slots, caching up to eight 224 GiB files (128 KiB blocks).
-Larger files use multiple slots, with 1.75 TiB files using all 8 slots.
-The index cache is designed to be memory efficient, and by default uses
-16 KiB.
-
-3.5 Fragment lookup table
--------------------------
-
-Regular files can contain a fragment index which is mapped to a fragment
-location on disk and compressed size using a fragment lookup table.  This
-fragment lookup table is itself stored compressed into metadata blocks.
-A second index table is used to locate these.  This second index table for
-speed of access (and because it is small) is read at mount time and cached
-in memory.
-
-3.6 Uid/gid lookup table
-------------------------
-
-For space efficiency regular files store uid and gid indexes, which are
-converted to 32-bit uids/gids using an id look up table.  This table is
-stored compressed into metadata blocks.  A second index table is used to
-locate these.  This second index table for speed of access (and because it
-is small) is read at mount time and cached in memory.
-
-3.7 Export table
-----------------
-
-To enable Squashfs filesystems to be exportable (via NFS etc.) filesystems
-can optionally (disabled with the -no-exports Mksquashfs option) contain
-an inode number to inode disk location lookup table.  This is required to
-enable Squashfs to map inode numbers passed in filehandles to the inode
-location on disk, which is necessary when the export code reinstantiates
-expired/flushed inodes.
-
-This table is stored compressed into metadata blocks.  A second index table is
-used to locate these.  This second index table for speed of access (and because
-it is small) is read at mount time and cached in memory.
-
-3.8 Xattr table
----------------
-
-The xattr table contains extended attributes for each inode.  The xattrs
-for each inode are stored in a list, each list entry containing a type,
-name and value field.  The type field encodes the xattr prefix
-("user.", "trusted." etc) and it also encodes how the name/value fields
-should be interpreted.  Currently the type indicates whether the value
-is stored inline (in which case the value field contains the xattr value),
-or if it is stored out of line (in which case the value field stores a
-reference to where the actual value is stored).  This allows large values
-to be stored out of line improving scanning and lookup performance and it
-also allows values to be de-duplicated, the value being stored once, and
-all other occurrences holding an out of line reference to that value.
-
-The xattr lists are packed into compressed 8K metadata blocks.
-To reduce overhead in inodes, rather than storing the on-disk
-location of the xattr list inside each inode, a 32-bit xattr id
-is stored.  This xattr id is mapped into the location of the xattr
-list using a second xattr id lookup table.
-
-4. TODOS AND OUTSTANDING ISSUES
--------------------------------
-
-4.1 Todo list
--------------
-
-Implement ACL support.
-
-4.2 Squashfs internal cache
----------------------------
-
-Blocks in Squashfs are compressed.  To avoid repeatedly decompressing
-recently accessed data Squashfs uses two small metadata and fragment caches.
-
-The cache is not used for file datablocks, these are decompressed and cached in
-the page-cache in the normal way.  The cache is used to temporarily cache
-fragment and metadata blocks which have been read as a result of a metadata
-(i.e. inode or directory) or fragment access.  Because metadata and fragments
-are packed together into blocks (to gain greater compression) the read of a
-particular piece of metadata or fragment will retrieve other metadata/fragments
-which have been packed with it, these because of locality-of-reference may be
-read in the near future. Temporarily caching them ensures they are available
-for near future access without requiring an additional read and decompress.
-
-In the future this internal cache may be replaced with an implementation which
-uses the kernel page cache.  Because the page cache operates on page sized
-units this may introduce additional complexity in terms of locking and
-associated race conditions.
-- 
cgit 


From 86beb976700b26576fe522a94a0b3a4e3d5ce424 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:24 +0100
Subject: docs: filesystems: convert sysfs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Adjust document and section titles;
- use :field: markup;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/5c480dcb467315b5df6e25372a65e473b585c36d.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/sysfs.rst | 418 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/sysfs.txt | 408 -----------------------------------
 3 files changed, 419 insertions(+), 408 deletions(-)
 create mode 100644 Documentation/filesystems/sysfs.rst
 delete mode 100644 Documentation/filesystems/sysfs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 97a5f65ae509..bafe92c72433 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -87,5 +87,6 @@ Documentation for filesystem implementations.
    relay
    romfs
    squashfs
+   sysfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/sysfs.rst b/Documentation/filesystems/sysfs.rst
new file mode 100644
index 000000000000..290891c3fecb
--- /dev/null
+++ b/Documentation/filesystems/sysfs.rst
@@ -0,0 +1,418 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================================================
+sysfs - _The_ filesystem for exporting kernel objects
+=====================================================
+
+Patrick Mochel	<mochel@osdl.org>
+
+Mike Murphy <mamurph@cs.clemson.edu>
+
+:Revised:    16 August 2011
+:Original:   10 January 2003
+
+
+What it is:
+~~~~~~~~~~~
+
+sysfs is a ram-based filesystem initially based on ramfs. It provides
+a means to export kernel data structures, their attributes, and the
+linkages between them to userspace.
+
+sysfs is tied inherently to the kobject infrastructure. Please read
+Documentation/kobject.txt for more information concerning the kobject
+interface.
+
+
+Using sysfs
+~~~~~~~~~~~
+
+sysfs is always compiled in if CONFIG_SYSFS is defined. You can access
+it by doing::
+
+    mount -t sysfs sysfs /sys
+
+
+Directory Creation
+~~~~~~~~~~~~~~~~~~
+
+For every kobject that is registered with the system, a directory is
+created for it in sysfs. That directory is created as a subdirectory
+of the kobject's parent, expressing internal object hierarchies to
+userspace. Top-level directories in sysfs represent the common
+ancestors of object hierarchies; i.e. the subsystems the objects
+belong to.
+
+Sysfs internally stores a pointer to the kobject that implements a
+directory in the kernfs_node object associated with the directory. In
+the past this kobject pointer has been used by sysfs to do reference
+counting directly on the kobject whenever the file is opened or closed.
+With the current sysfs implementation the kobject reference count is
+only modified directly by the function sysfs_schedule_callback().
+
+
+Attributes
+~~~~~~~~~~
+
+Attributes can be exported for kobjects in the form of regular files in
+the filesystem. Sysfs forwards file I/O operations to methods defined
+for the attributes, providing a means to read and write kernel
+attributes.
+
+Attributes should be ASCII text files, preferably with only one value
+per file. It is noted that it may not be efficient to contain only one
+value per file, so it is socially acceptable to express an array of
+values of the same type.
+
+Mixing types, expressing multiple lines of data, and doing fancy
+formatting of data is heavily frowned upon. Doing these things may get
+you publicly humiliated and your code rewritten without notice.
+
+
+An attribute definition is simply::
+
+    struct attribute {
+	    char                    * name;
+	    struct module		*owner;
+	    umode_t                 mode;
+    };
+
+
+    int sysfs_create_file(struct kobject * kobj, const struct attribute * attr);
+    void sysfs_remove_file(struct kobject * kobj, const struct attribute * attr);
+
+
+A bare attribute contains no means to read or write the value of the
+attribute. Subsystems are encouraged to define their own attribute
+structure and wrapper functions for adding and removing attributes for
+a specific object type.
+
+For example, the driver model defines struct device_attribute like::
+
+    struct device_attribute {
+	    struct attribute	attr;
+	    ssize_t (*show)(struct device *dev, struct device_attribute *attr,
+			    char *buf);
+	    ssize_t (*store)(struct device *dev, struct device_attribute *attr,
+			    const char *buf, size_t count);
+    };
+
+    int device_create_file(struct device *, const struct device_attribute *);
+    void device_remove_file(struct device *, const struct device_attribute *);
+
+It also defines this helper for defining device attributes::
+
+    #define DEVICE_ATTR(_name, _mode, _show, _store) \
+    struct device_attribute dev_attr_##_name = __ATTR(_name, _mode, _show, _store)
+
+For example, declaring::
+
+    static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo);
+
+is equivalent to doing::
+
+    static struct device_attribute dev_attr_foo = {
+	    .attr = {
+		    .name = "foo",
+		    .mode = S_IWUSR | S_IRUGO,
+	    },
+	    .show = show_foo,
+	    .store = store_foo,
+    };
+
+Note as stated in include/linux/kernel.h "OTHER_WRITABLE?  Generally
+considered a bad idea." so trying to set a sysfs file writable for
+everyone will fail reverting to RO mode for "Others".
+
+For the common cases sysfs.h provides convenience macros to make
+defining attributes easier as well as making code more concise and
+readable. The above case could be shortened to:
+
+static struct device_attribute dev_attr_foo = __ATTR_RW(foo);
+
+the list of helpers available to define your wrapper function is:
+
+__ATTR_RO(name):
+		 assumes default name_show and mode 0444
+__ATTR_WO(name):
+		 assumes a name_store only and is restricted to mode
+                 0200 that is root write access only.
+__ATTR_RO_MODE(name, mode):
+	         fore more restrictive RO access currently
+                 only use case is the EFI System Resource Table
+                 (see drivers/firmware/efi/esrt.c)
+__ATTR_RW(name):
+	         assumes default name_show, name_store and setting
+                 mode to 0644.
+__ATTR_NULL:
+	         which sets the name to NULL and is used as end of list
+                 indicator (see: kernel/workqueue.c)
+
+Subsystem-Specific Callbacks
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When a subsystem defines a new attribute type, it must implement a
+set of sysfs operations for forwarding read and write calls to the
+show and store methods of the attribute owners::
+
+    struct sysfs_ops {
+	    ssize_t (*show)(struct kobject *, struct attribute *, char *);
+	    ssize_t (*store)(struct kobject *, struct attribute *, const char *, size_t);
+    };
+
+[ Subsystems should have already defined a struct kobj_type as a
+descriptor for this type, which is where the sysfs_ops pointer is
+stored. See the kobject documentation for more information. ]
+
+When a file is read or written, sysfs calls the appropriate method
+for the type. The method then translates the generic struct kobject
+and struct attribute pointers to the appropriate pointer types, and
+calls the associated methods.
+
+
+To illustrate::
+
+    #define to_dev(obj) container_of(obj, struct device, kobj)
+    #define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr)
+
+    static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr,
+				char *buf)
+    {
+	    struct device_attribute *dev_attr = to_dev_attr(attr);
+	    struct device *dev = to_dev(kobj);
+	    ssize_t ret = -EIO;
+
+	    if (dev_attr->show)
+		    ret = dev_attr->show(dev, dev_attr, buf);
+	    if (ret >= (ssize_t)PAGE_SIZE) {
+		    printk("dev_attr_show: %pS returned bad count\n",
+				    dev_attr->show);
+	    }
+	    return ret;
+    }
+
+
+
+Reading/Writing Attribute Data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To read or write attributes, show() or store() methods must be
+specified when declaring the attribute. The method types should be as
+simple as those defined for device attributes::
+
+    ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf);
+    ssize_t (*store)(struct device *dev, struct device_attribute *attr,
+		    const char *buf, size_t count);
+
+IOW, they should take only an object, an attribute, and a buffer as parameters.
+
+
+sysfs allocates a buffer of size (PAGE_SIZE) and passes it to the
+method. Sysfs will call the method exactly once for each read or
+write. This forces the following behavior on the method
+implementations:
+
+- On read(2), the show() method should fill the entire buffer.
+  Recall that an attribute should only be exporting one value, or an
+  array of similar values, so this shouldn't be that expensive.
+
+  This allows userspace to do partial reads and forward seeks
+  arbitrarily over the entire file at will. If userspace seeks back to
+  zero or does a pread(2) with an offset of '0' the show() method will
+  be called again, rearmed, to fill the buffer.
+
+- On write(2), sysfs expects the entire buffer to be passed during the
+  first write. Sysfs then passes the entire buffer to the store() method.
+  A terminating null is added after the data on stores. This makes
+  functions like sysfs_streq() safe to use.
+
+  When writing sysfs files, userspace processes should first read the
+  entire file, modify the values it wishes to change, then write the
+  entire buffer back.
+
+  Attribute method implementations should operate on an identical
+  buffer when reading and writing values.
+
+Other notes:
+
+- Writing causes the show() method to be rearmed regardless of current
+  file position.
+
+- The buffer will always be PAGE_SIZE bytes in length. On i386, this
+  is 4096.
+
+- show() methods should return the number of bytes printed into the
+  buffer. This is the return value of scnprintf().
+
+- show() must not use snprintf() when formatting the value to be
+  returned to user space. If you can guarantee that an overflow
+  will never happen you can use sprintf() otherwise you must use
+  scnprintf().
+
+- store() should return the number of bytes used from the buffer. If the
+  entire buffer has been used, just return the count argument.
+
+- show() or store() can always return errors. If a bad value comes
+  through, be sure to return an error.
+
+- The object passed to the methods will be pinned in memory via sysfs
+  referencing counting its embedded object. However, the physical
+  entity (e.g. device) the object represents may not be present. Be
+  sure to have a way to check this, if necessary.
+
+
+A very simple (and naive) implementation of a device attribute is::
+
+    static ssize_t show_name(struct device *dev, struct device_attribute *attr,
+			    char *buf)
+    {
+	    return scnprintf(buf, PAGE_SIZE, "%s\n", dev->name);
+    }
+
+    static ssize_t store_name(struct device *dev, struct device_attribute *attr,
+			    const char *buf, size_t count)
+    {
+	    snprintf(dev->name, sizeof(dev->name), "%.*s",
+		    (int)min(count, sizeof(dev->name) - 1), buf);
+	    return count;
+    }
+
+    static DEVICE_ATTR(name, S_IRUGO, show_name, store_name);
+
+
+(Note that the real implementation doesn't allow userspace to set the
+name for a device.)
+
+
+Top Level Directory Layout
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The sysfs directory arrangement exposes the relationship of kernel
+data structures.
+
+The top level sysfs directory looks like::
+
+    block/
+    bus/
+    class/
+    dev/
+    devices/
+    firmware/
+    net/
+    fs/
+
+devices/ contains a filesystem representation of the device tree. It maps
+directly to the internal kernel device tree, which is a hierarchy of
+struct device.
+
+bus/ contains flat directory layout of the various bus types in the
+kernel. Each bus's directory contains two subdirectories::
+
+	devices/
+	drivers/
+
+devices/ contains symlinks for each device discovered in the system
+that point to the device's directory under root/.
+
+drivers/ contains a directory for each device driver that is loaded
+for devices on that particular bus (this assumes that drivers do not
+span multiple bus types).
+
+fs/ contains a directory for some filesystems.  Currently each
+filesystem wanting to export attributes must create its own hierarchy
+below fs/ (see ./fuse.txt for an example).
+
+dev/ contains two directories char/ and block/. Inside these two
+directories there are symlinks named <major>:<minor>.  These symlinks
+point to the sysfs directory for the given device.  /sys/dev provides a
+quick way to lookup the sysfs interface for a device from the result of
+a stat(2) operation.
+
+More information can driver-model specific features can be found in
+Documentation/driver-api/driver-model/.
+
+
+TODO: Finish this section.
+
+
+Current Interfaces
+~~~~~~~~~~~~~~~~~~
+
+The following interface layers currently exist in sysfs:
+
+
+devices (include/linux/device.h)
+--------------------------------
+Structure::
+
+    struct device_attribute {
+	    struct attribute	attr;
+	    ssize_t (*show)(struct device *dev, struct device_attribute *attr,
+			    char *buf);
+	    ssize_t (*store)(struct device *dev, struct device_attribute *attr,
+			    const char *buf, size_t count);
+    };
+
+Declaring::
+
+    DEVICE_ATTR(_name, _mode, _show, _store);
+
+Creation/Removal::
+
+    int device_create_file(struct device *dev, const struct device_attribute * attr);
+    void device_remove_file(struct device *dev, const struct device_attribute * attr);
+
+
+bus drivers (include/linux/device.h)
+------------------------------------
+Structure::
+
+    struct bus_attribute {
+	    struct attribute        attr;
+	    ssize_t (*show)(struct bus_type *, char * buf);
+	    ssize_t (*store)(struct bus_type *, const char * buf, size_t count);
+    };
+
+Declaring::
+
+    static BUS_ATTR_RW(name);
+    static BUS_ATTR_RO(name);
+    static BUS_ATTR_WO(name);
+
+Creation/Removal::
+
+    int bus_create_file(struct bus_type *, struct bus_attribute *);
+    void bus_remove_file(struct bus_type *, struct bus_attribute *);
+
+
+device drivers (include/linux/device.h)
+---------------------------------------
+
+Structure::
+
+    struct driver_attribute {
+	    struct attribute        attr;
+	    ssize_t (*show)(struct device_driver *, char * buf);
+	    ssize_t (*store)(struct device_driver *, const char * buf,
+			    size_t count);
+    };
+
+Declaring::
+
+    DRIVER_ATTR_RO(_name)
+    DRIVER_ATTR_RW(_name)
+
+Creation/Removal::
+
+    int driver_create_file(struct device_driver *, const struct driver_attribute *);
+    void driver_remove_file(struct device_driver *, const struct driver_attribute *);
+
+
+Documentation
+~~~~~~~~~~~~~
+
+The sysfs directory structure and the attributes in each directory define an
+ABI between the kernel and user space. As for any ABI, it is important that
+this ABI is stable and properly documented. All new sysfs attributes must be
+documented in Documentation/ABI. See also Documentation/ABI/README for more
+information.
diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt
deleted file mode 100644
index ddf15b1b0d5a..000000000000
--- a/Documentation/filesystems/sysfs.txt
+++ /dev/null
@@ -1,408 +0,0 @@
-
-sysfs - _The_ filesystem for exporting kernel objects. 
-
-Patrick Mochel	<mochel@osdl.org>
-Mike Murphy <mamurph@cs.clemson.edu>
-
-Revised:    16 August 2011
-Original:   10 January 2003
-
-
-What it is:
-~~~~~~~~~~~
-
-sysfs is a ram-based filesystem initially based on ramfs. It provides
-a means to export kernel data structures, their attributes, and the 
-linkages between them to userspace. 
-
-sysfs is tied inherently to the kobject infrastructure. Please read
-Documentation/kobject.txt for more information concerning the kobject
-interface. 
-
-
-Using sysfs
-~~~~~~~~~~~
-
-sysfs is always compiled in if CONFIG_SYSFS is defined. You can access
-it by doing:
-
-    mount -t sysfs sysfs /sys 
-
-
-Directory Creation
-~~~~~~~~~~~~~~~~~~
-
-For every kobject that is registered with the system, a directory is
-created for it in sysfs. That directory is created as a subdirectory
-of the kobject's parent, expressing internal object hierarchies to
-userspace. Top-level directories in sysfs represent the common
-ancestors of object hierarchies; i.e. the subsystems the objects
-belong to. 
-
-Sysfs internally stores a pointer to the kobject that implements a
-directory in the kernfs_node object associated with the directory. In
-the past this kobject pointer has been used by sysfs to do reference
-counting directly on the kobject whenever the file is opened or closed.
-With the current sysfs implementation the kobject reference count is
-only modified directly by the function sysfs_schedule_callback().
-
-
-Attributes
-~~~~~~~~~~
-
-Attributes can be exported for kobjects in the form of regular files in
-the filesystem. Sysfs forwards file I/O operations to methods defined
-for the attributes, providing a means to read and write kernel
-attributes.
-
-Attributes should be ASCII text files, preferably with only one value
-per file. It is noted that it may not be efficient to contain only one
-value per file, so it is socially acceptable to express an array of
-values of the same type. 
-
-Mixing types, expressing multiple lines of data, and doing fancy
-formatting of data is heavily frowned upon. Doing these things may get
-you publicly humiliated and your code rewritten without notice. 
-
-
-An attribute definition is simply:
-
-struct attribute {
-        char                    * name;
-        struct module		*owner;
-        umode_t                 mode;
-};
-
-
-int sysfs_create_file(struct kobject * kobj, const struct attribute * attr);
-void sysfs_remove_file(struct kobject * kobj, const struct attribute * attr);
-
-
-A bare attribute contains no means to read or write the value of the
-attribute. Subsystems are encouraged to define their own attribute
-structure and wrapper functions for adding and removing attributes for
-a specific object type. 
-
-For example, the driver model defines struct device_attribute like:
-
-struct device_attribute {
-	struct attribute	attr;
-	ssize_t (*show)(struct device *dev, struct device_attribute *attr,
-			char *buf);
-	ssize_t (*store)(struct device *dev, struct device_attribute *attr,
-			 const char *buf, size_t count);
-};
-
-int device_create_file(struct device *, const struct device_attribute *);
-void device_remove_file(struct device *, const struct device_attribute *);
-
-It also defines this helper for defining device attributes: 
-
-#define DEVICE_ATTR(_name, _mode, _show, _store) \
-struct device_attribute dev_attr_##_name = __ATTR(_name, _mode, _show, _store)
-
-For example, declaring
-
-static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo);
-
-is equivalent to doing:
-
-static struct device_attribute dev_attr_foo = {
-	.attr = {
-		.name = "foo",
-		.mode = S_IWUSR | S_IRUGO,
-	},
-	.show = show_foo,
-	.store = store_foo,
-};
-
-Note as stated in include/linux/kernel.h "OTHER_WRITABLE?  Generally
-considered a bad idea." so trying to set a sysfs file writable for
-everyone will fail reverting to RO mode for "Others".
-
-For the common cases sysfs.h provides convenience macros to make
-defining attributes easier as well as making code more concise and
-readable. The above case could be shortened to:
-
-static struct device_attribute dev_attr_foo = __ATTR_RW(foo);
-
-the list of helpers available to define your wrapper function is:
-__ATTR_RO(name): assumes default name_show and mode 0444
-__ATTR_WO(name): assumes a name_store only and is restricted to mode
-                 0200 that is root write access only.
-__ATTR_RO_MODE(name, mode): fore more restrictive RO access currently
-                 only use case is the EFI System Resource Table
-                 (see drivers/firmware/efi/esrt.c)
-__ATTR_RW(name): assumes default name_show, name_store and setting
-                 mode to 0644.
-__ATTR_NULL: which sets the name to NULL and is used as end of list
-                 indicator (see: kernel/workqueue.c)
-
-Subsystem-Specific Callbacks
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-When a subsystem defines a new attribute type, it must implement a
-set of sysfs operations for forwarding read and write calls to the
-show and store methods of the attribute owners. 
-
-struct sysfs_ops {
-        ssize_t (*show)(struct kobject *, struct attribute *, char *);
-        ssize_t (*store)(struct kobject *, struct attribute *, const char *, size_t);
-};
-
-[ Subsystems should have already defined a struct kobj_type as a
-descriptor for this type, which is where the sysfs_ops pointer is
-stored. See the kobject documentation for more information. ]
-
-When a file is read or written, sysfs calls the appropriate method
-for the type. The method then translates the generic struct kobject
-and struct attribute pointers to the appropriate pointer types, and
-calls the associated methods. 
-
-
-To illustrate:
-
-#define to_dev(obj) container_of(obj, struct device, kobj)
-#define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr)
-
-static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr,
-                             char *buf)
-{
-        struct device_attribute *dev_attr = to_dev_attr(attr);
-        struct device *dev = to_dev(kobj);
-        ssize_t ret = -EIO;
-
-        if (dev_attr->show)
-                ret = dev_attr->show(dev, dev_attr, buf);
-        if (ret >= (ssize_t)PAGE_SIZE) {
-                printk("dev_attr_show: %pS returned bad count\n",
-                                dev_attr->show);
-        }
-        return ret;
-}
-
-
-
-Reading/Writing Attribute Data
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-To read or write attributes, show() or store() methods must be
-specified when declaring the attribute. The method types should be as
-simple as those defined for device attributes:
-
-ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf);
-ssize_t (*store)(struct device *dev, struct device_attribute *attr,
-                 const char *buf, size_t count);
-
-IOW, they should take only an object, an attribute, and a buffer as parameters.
-
-
-sysfs allocates a buffer of size (PAGE_SIZE) and passes it to the
-method. Sysfs will call the method exactly once for each read or
-write. This forces the following behavior on the method
-implementations: 
-
-- On read(2), the show() method should fill the entire buffer. 
-  Recall that an attribute should only be exporting one value, or an
-  array of similar values, so this shouldn't be that expensive. 
-
-  This allows userspace to do partial reads and forward seeks
-  arbitrarily over the entire file at will. If userspace seeks back to
-  zero or does a pread(2) with an offset of '0' the show() method will
-  be called again, rearmed, to fill the buffer.
-
-- On write(2), sysfs expects the entire buffer to be passed during the
-  first write. Sysfs then passes the entire buffer to the store() method.
-  A terminating null is added after the data on stores. This makes
-  functions like sysfs_streq() safe to use.
-
-  When writing sysfs files, userspace processes should first read the
-  entire file, modify the values it wishes to change, then write the
-  entire buffer back. 
-
-  Attribute method implementations should operate on an identical
-  buffer when reading and writing values. 
-
-Other notes:
-
-- Writing causes the show() method to be rearmed regardless of current
-  file position.
-
-- The buffer will always be PAGE_SIZE bytes in length. On i386, this
-  is 4096. 
-
-- show() methods should return the number of bytes printed into the
-  buffer. This is the return value of scnprintf().
-
-- show() must not use snprintf() when formatting the value to be
-  returned to user space. If you can guarantee that an overflow
-  will never happen you can use sprintf() otherwise you must use
-  scnprintf().
-
-- store() should return the number of bytes used from the buffer. If the
-  entire buffer has been used, just return the count argument.
-
-- show() or store() can always return errors. If a bad value comes
-  through, be sure to return an error.
-
-- The object passed to the methods will be pinned in memory via sysfs
-  referencing counting its embedded object. However, the physical 
-  entity (e.g. device) the object represents may not be present. Be 
-  sure to have a way to check this, if necessary. 
-
-
-A very simple (and naive) implementation of a device attribute is:
-
-static ssize_t show_name(struct device *dev, struct device_attribute *attr,
-                         char *buf)
-{
-	return scnprintf(buf, PAGE_SIZE, "%s\n", dev->name);
-}
-
-static ssize_t store_name(struct device *dev, struct device_attribute *attr,
-                          const char *buf, size_t count)
-{
-        snprintf(dev->name, sizeof(dev->name), "%.*s",
-                 (int)min(count, sizeof(dev->name) - 1), buf);
-	return count;
-}
-
-static DEVICE_ATTR(name, S_IRUGO, show_name, store_name);
-
-
-(Note that the real implementation doesn't allow userspace to set the 
-name for a device.)
-
-
-Top Level Directory Layout
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The sysfs directory arrangement exposes the relationship of kernel
-data structures. 
-
-The top level sysfs directory looks like:
-
-block/
-bus/
-class/
-dev/
-devices/
-firmware/
-net/
-fs/
-
-devices/ contains a filesystem representation of the device tree. It maps
-directly to the internal kernel device tree, which is a hierarchy of
-struct device. 
-
-bus/ contains flat directory layout of the various bus types in the
-kernel. Each bus's directory contains two subdirectories:
-
-	devices/
-	drivers/
-
-devices/ contains symlinks for each device discovered in the system
-that point to the device's directory under root/.
-
-drivers/ contains a directory for each device driver that is loaded
-for devices on that particular bus (this assumes that drivers do not
-span multiple bus types).
-
-fs/ contains a directory for some filesystems.  Currently each
-filesystem wanting to export attributes must create its own hierarchy
-below fs/ (see ./fuse.txt for an example).
-
-dev/ contains two directories char/ and block/. Inside these two
-directories there are symlinks named <major>:<minor>.  These symlinks
-point to the sysfs directory for the given device.  /sys/dev provides a
-quick way to lookup the sysfs interface for a device from the result of
-a stat(2) operation.
-
-More information can driver-model specific features can be found in
-Documentation/driver-api/driver-model/.
-
-
-TODO: Finish this section.
-
-
-Current Interfaces
-~~~~~~~~~~~~~~~~~~
-
-The following interface layers currently exist in sysfs:
-
-
-- devices (include/linux/device.h)
-----------------------------------
-Structure:
-
-struct device_attribute {
-	struct attribute	attr;
-	ssize_t (*show)(struct device *dev, struct device_attribute *attr,
-			char *buf);
-	ssize_t (*store)(struct device *dev, struct device_attribute *attr,
-			 const char *buf, size_t count);
-};
-
-Declaring:
-
-DEVICE_ATTR(_name, _mode, _show, _store);
-
-Creation/Removal:
-
-int device_create_file(struct device *dev, const struct device_attribute * attr);
-void device_remove_file(struct device *dev, const struct device_attribute * attr);
-
-
-- bus drivers (include/linux/device.h)
---------------------------------------
-Structure:
-
-struct bus_attribute {
-        struct attribute        attr;
-        ssize_t (*show)(struct bus_type *, char * buf);
-        ssize_t (*store)(struct bus_type *, const char * buf, size_t count);
-};
-
-Declaring:
-
-static BUS_ATTR_RW(name);
-static BUS_ATTR_RO(name);
-static BUS_ATTR_WO(name);
-
-Creation/Removal:
-
-int bus_create_file(struct bus_type *, struct bus_attribute *);
-void bus_remove_file(struct bus_type *, struct bus_attribute *);
-
-
-- device drivers (include/linux/device.h)
------------------------------------------
-
-Structure:
-
-struct driver_attribute {
-        struct attribute        attr;
-        ssize_t (*show)(struct device_driver *, char * buf);
-        ssize_t (*store)(struct device_driver *, const char * buf,
-                         size_t count);
-};
-
-Declaring:
-
-DRIVER_ATTR_RO(_name)
-DRIVER_ATTR_RW(_name)
-
-Creation/Removal:
-
-int driver_create_file(struct device_driver *, const struct driver_attribute *);
-void driver_remove_file(struct device_driver *, const struct driver_attribute *);
-
-
-Documentation
-~~~~~~~~~~~~~
-
-The sysfs directory structure and the attributes in each directory define an
-ABI between the kernel and user space. As for any ABI, it is important that
-this ABI is stable and properly documented. All new sysfs attributes must be
-documented in Documentation/ABI. See also Documentation/ABI/README for more
-information.
-- 
cgit 


From 826a613d3f81695022f324a5cb84fe73ec09e51d Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:25 +0100
Subject: docs: filesystems: convert sysv-fs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/5b96a6efba95773af439ab25a7dbe4d0edf8c867.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst   |   1 +
 Documentation/filesystems/sysv-fs.rst | 264 ++++++++++++++++++++++++++++++++++
 Documentation/filesystems/sysv-fs.txt | 197 -------------------------
 3 files changed, 265 insertions(+), 197 deletions(-)
 create mode 100644 Documentation/filesystems/sysv-fs.rst
 delete mode 100644 Documentation/filesystems/sysv-fs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index bafe92c72433..d583b8b35196 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -88,5 +88,6 @@ Documentation for filesystem implementations.
    romfs
    squashfs
    sysfs
+   sysv-fs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/sysv-fs.rst b/Documentation/filesystems/sysv-fs.rst
new file mode 100644
index 000000000000..89e40911ad7c
--- /dev/null
+++ b/Documentation/filesystems/sysv-fs.rst
@@ -0,0 +1,264 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================
+SystemV Filesystem
+==================
+
+It implements all of
+  - Xenix FS,
+  - SystemV/386 FS,
+  - Coherent FS.
+
+To install:
+
+* Answer the 'System V and Coherent filesystem support' question with 'y'
+  when configuring the kernel.
+* To mount a disk or a partition, use::
+
+    mount [-r] -t sysv device mountpoint
+
+  The file system type names::
+
+               -t sysv
+               -t xenix
+               -t coherent
+
+  may be used interchangeably, but the last two will eventually disappear.
+
+Bugs in the present implementation:
+
+- Coherent FS:
+
+  - The "free list interleave" n:m is currently ignored.
+  - Only file systems with no filesystem name and no pack name are recognized.
+    (See Coherent "man mkfs" for a description of these features.)
+
+- SystemV Release 2 FS:
+
+  The superblock is only searched in the blocks 9, 15, 18, which
+  corresponds to the beginning of track 1 on floppy disks. No support
+  for this FS on hard disk yet.
+
+
+These filesystems are rather similar. Here is a comparison with Minix FS:
+
+* Linux fdisk reports on partitions
+
+  - Minix FS     0x81 Linux/Minix
+  - Xenix FS     ??
+  - SystemV FS   ??
+  - Coherent FS  0x08 AIX bootable
+
+* Size of a block or zone (data allocation unit on disk)
+
+  - Minix FS     1024
+  - Xenix FS     1024 (also 512 ??)
+  - SystemV FS   1024 (also 512 and 2048)
+  - Coherent FS   512
+
+* General layout: all have one boot block, one super block and
+  separate areas for inodes and for directories/data.
+  On SystemV Release 2 FS (e.g. Microport) the first track is reserved and
+  all the block numbers (including the super block) are offset by one track.
+
+* Byte ordering of "short" (16 bit entities) on disk:
+
+  - Minix FS     little endian  0 1
+  - Xenix FS     little endian  0 1
+  - SystemV FS   little endian  0 1
+  - Coherent FS  little endian  0 1
+
+  Of course, this affects only the file system, not the data of files on it!
+
+* Byte ordering of "long" (32 bit entities) on disk:
+
+  - Minix FS     little endian  0 1 2 3
+  - Xenix FS     little endian  0 1 2 3
+  - SystemV FS   little endian  0 1 2 3
+  - Coherent FS  PDP-11         2 3 0 1
+
+  Of course, this affects only the file system, not the data of files on it!
+
+* Inode on disk: "short", 0 means non-existent, the root dir ino is:
+
+  =================================  ==
+  Minix FS                            1
+  Xenix FS, SystemV FS, Coherent FS   2
+  =================================  ==
+
+* Maximum number of hard links to a file:
+
+  ===========  =========
+  Minix FS     250
+  Xenix FS     ??
+  SystemV FS   ??
+  Coherent FS  >=10000
+  ===========  =========
+
+* Free inode management:
+
+  - Minix FS
+      a bitmap
+  - Xenix FS, SystemV FS, Coherent FS
+      There is a cache of a certain number of free inodes in the super-block.
+      When it is exhausted, new free inodes are found using a linear search.
+
+* Free block management:
+
+  - Minix FS
+      a bitmap
+  - Xenix FS, SystemV FS, Coherent FS
+      Free blocks are organized in a "free list". Maybe a misleading term,
+      since it is not true that every free block contains a pointer to
+      the next free block. Rather, the free blocks are organized in chunks
+      of limited size, and every now and then a free block contains pointers
+      to the free blocks pertaining to the next chunk; the first of these
+      contains pointers and so on. The list terminates with a "block number"
+      0 on Xenix FS and SystemV FS, with a block zeroed out on Coherent FS.
+
+* Super-block location:
+
+  ===========  ==========================
+  Minix FS     block 1 = bytes 1024..2047
+  Xenix FS     block 1 = bytes 1024..2047
+  SystemV FS   bytes 512..1023
+  Coherent FS  block 1 = bytes 512..1023
+  ===========  ==========================
+
+* Super-block layout:
+
+  - Minix FS::
+
+                    unsigned short s_ninodes;
+                    unsigned short s_nzones;
+                    unsigned short s_imap_blocks;
+                    unsigned short s_zmap_blocks;
+                    unsigned short s_firstdatazone;
+                    unsigned short s_log_zone_size;
+                    unsigned long s_max_size;
+                    unsigned short s_magic;
+
+  - Xenix FS, SystemV FS, Coherent FS::
+
+                    unsigned short s_firstdatazone;
+                    unsigned long  s_nzones;
+                    unsigned short s_fzone_count;
+                    unsigned long  s_fzones[NICFREE];
+                    unsigned short s_finode_count;
+                    unsigned short s_finodes[NICINOD];
+                    char           s_flock;
+                    char           s_ilock;
+                    char           s_modified;
+                    char           s_rdonly;
+                    unsigned long  s_time;
+                    short          s_dinfo[4]; -- SystemV FS only
+                    unsigned long  s_free_zones;
+                    unsigned short s_free_inodes;
+                    short          s_dinfo[4]; -- Xenix FS only
+                    unsigned short s_interleave_m,s_interleave_n; -- Coherent FS only
+                    char           s_fname[6];
+                    char           s_fpack[6];
+
+    then they differ considerably:
+
+        Xenix FS::
+
+                    char           s_clean;
+                    char           s_fill[371];
+                    long           s_magic;
+                    long           s_type;
+
+        SystemV FS::
+
+                    long           s_fill[12 or 14];
+                    long           s_state;
+                    long           s_magic;
+                    long           s_type;
+
+        Coherent FS::
+
+                    unsigned long  s_unique;
+
+    Note that Coherent FS has no magic.
+
+* Inode layout:
+
+  - Minix FS::
+
+                    unsigned short i_mode;
+                    unsigned short i_uid;
+                    unsigned long  i_size;
+                    unsigned long  i_time;
+                    unsigned char  i_gid;
+                    unsigned char  i_nlinks;
+                    unsigned short i_zone[7+1+1];
+
+  - Xenix FS, SystemV FS, Coherent FS::
+
+                    unsigned short i_mode;
+                    unsigned short i_nlink;
+                    unsigned short i_uid;
+                    unsigned short i_gid;
+                    unsigned long  i_size;
+                    unsigned char  i_zone[3*(10+1+1+1)];
+                    unsigned long  i_atime;
+                    unsigned long  i_mtime;
+                    unsigned long  i_ctime;
+
+
+* Regular file data blocks are organized as
+
+  - Minix FS:
+
+             - 7 direct blocks
+	     - 1 indirect block (pointers to blocks)
+             - 1 double-indirect block (pointer to pointers to blocks)
+
+  - Xenix FS, SystemV FS, Coherent FS:
+
+             - 10 direct blocks
+             -  1 indirect block (pointers to blocks)
+             -  1 double-indirect block (pointer to pointers to blocks)
+             -  1 triple-indirect block (pointer to pointers to pointers to blocks)
+
+
+  ===========  ==========   ================
+               Inode size   inodes per block
+  ===========  ==========   ================
+  Minix FS        32        32
+  Xenix FS        64        16
+  SystemV FS      64        16
+  Coherent FS     64        8
+  ===========  ==========   ================
+
+* Directory entry on disk
+
+  - Minix FS::
+
+                    unsigned short inode;
+                    char name[14/30];
+
+  - Xenix FS, SystemV FS, Coherent FS::
+
+                    unsigned short inode;
+                    char name[14];
+
+  ===========    ==============    =====================
+                 Dir entry size    dir entries per block
+  ===========    ==============    =====================
+  Minix FS       16/32             64/32
+  Xenix FS       16                64
+  SystemV FS     16                64
+  Coherent FS    16                32
+  ===========    ==============    =====================
+
+* How to implement symbolic links such that the host fsck doesn't scream:
+
+  - Minix FS     normal
+  - Xenix FS     kludge: as regular files with  chmod 1000
+  - SystemV FS   ??
+  - Coherent FS  kludge: as regular files with  chmod 1000
+
+
+Notation: We often speak of a "block" but mean a zone (the allocation unit)
+and not the disk driver's notion of "block".
diff --git a/Documentation/filesystems/sysv-fs.txt b/Documentation/filesystems/sysv-fs.txt
deleted file mode 100644
index 253b50d1328e..000000000000
--- a/Documentation/filesystems/sysv-fs.txt
+++ /dev/null
@@ -1,197 +0,0 @@
-It implements all of
-  - Xenix FS,
-  - SystemV/386 FS,
-  - Coherent FS.
-
-To install:
-* Answer the 'System V and Coherent filesystem support' question with 'y'
-  when configuring the kernel.
-* To mount a disk or a partition, use
-    mount [-r] -t sysv device mountpoint
-  The file system type names
-               -t sysv
-               -t xenix
-               -t coherent
-  may be used interchangeably, but the last two will eventually disappear.
-
-Bugs in the present implementation:
-- Coherent FS:
-  - The "free list interleave" n:m is currently ignored.
-  - Only file systems with no filesystem name and no pack name are recognized.
-  (See Coherent "man mkfs" for a description of these features.)
-- SystemV Release 2 FS:
-  The superblock is only searched in the blocks 9, 15, 18, which
-  corresponds to the beginning of track 1 on floppy disks. No support
-  for this FS on hard disk yet.
-
-
-These filesystems are rather similar. Here is a comparison with Minix FS:
-
-* Linux fdisk reports on partitions
-  - Minix FS     0x81 Linux/Minix
-  - Xenix FS     ??
-  - SystemV FS   ??
-  - Coherent FS  0x08 AIX bootable
-
-* Size of a block or zone (data allocation unit on disk)
-  - Minix FS     1024
-  - Xenix FS     1024 (also 512 ??)
-  - SystemV FS   1024 (also 512 and 2048)
-  - Coherent FS   512
-
-* General layout: all have one boot block, one super block and
-  separate areas for inodes and for directories/data.
-  On SystemV Release 2 FS (e.g. Microport) the first track is reserved and
-  all the block numbers (including the super block) are offset by one track.
-
-* Byte ordering of "short" (16 bit entities) on disk:
-  - Minix FS     little endian  0 1
-  - Xenix FS     little endian  0 1
-  - SystemV FS   little endian  0 1
-  - Coherent FS  little endian  0 1
-  Of course, this affects only the file system, not the data of files on it!
-
-* Byte ordering of "long" (32 bit entities) on disk:
-  - Minix FS     little endian  0 1 2 3
-  - Xenix FS     little endian  0 1 2 3
-  - SystemV FS   little endian  0 1 2 3
-  - Coherent FS  PDP-11         2 3 0 1
-  Of course, this affects only the file system, not the data of files on it!
-
-* Inode on disk: "short", 0 means non-existent, the root dir ino is:
-  - Minix FS                            1
-  - Xenix FS, SystemV FS, Coherent FS   2
-
-* Maximum number of hard links to a file:
-  - Minix FS     250
-  - Xenix FS     ??
-  - SystemV FS   ??
-  - Coherent FS  >=10000
-
-* Free inode management:
-  - Minix FS                             a bitmap
-  - Xenix FS, SystemV FS, Coherent FS
-      There is a cache of a certain number of free inodes in the super-block.
-      When it is exhausted, new free inodes are found using a linear search.
-
-* Free block management:
-  - Minix FS                             a bitmap
-  - Xenix FS, SystemV FS, Coherent FS
-      Free blocks are organized in a "free list". Maybe a misleading term,
-      since it is not true that every free block contains a pointer to
-      the next free block. Rather, the free blocks are organized in chunks
-      of limited size, and every now and then a free block contains pointers
-      to the free blocks pertaining to the next chunk; the first of these
-      contains pointers and so on. The list terminates with a "block number"
-      0 on Xenix FS and SystemV FS, with a block zeroed out on Coherent FS.
-
-* Super-block location:
-  - Minix FS     block 1 = bytes 1024..2047
-  - Xenix FS     block 1 = bytes 1024..2047
-  - SystemV FS   bytes 512..1023
-  - Coherent FS  block 1 = bytes 512..1023
-
-* Super-block layout:
-  - Minix FS
-                    unsigned short s_ninodes;
-                    unsigned short s_nzones;
-                    unsigned short s_imap_blocks;
-                    unsigned short s_zmap_blocks;
-                    unsigned short s_firstdatazone;
-                    unsigned short s_log_zone_size;
-                    unsigned long s_max_size;
-                    unsigned short s_magic;
-  - Xenix FS, SystemV FS, Coherent FS
-                    unsigned short s_firstdatazone;
-                    unsigned long  s_nzones;
-                    unsigned short s_fzone_count;
-                    unsigned long  s_fzones[NICFREE];
-                    unsigned short s_finode_count;
-                    unsigned short s_finodes[NICINOD];
-                    char           s_flock;
-                    char           s_ilock;
-                    char           s_modified;
-                    char           s_rdonly;
-                    unsigned long  s_time;
-                    short          s_dinfo[4]; -- SystemV FS only
-                    unsigned long  s_free_zones;
-                    unsigned short s_free_inodes;
-                    short          s_dinfo[4]; -- Xenix FS only
-                    unsigned short s_interleave_m,s_interleave_n; -- Coherent FS only
-                    char           s_fname[6];
-                    char           s_fpack[6];
-    then they differ considerably:
-        Xenix FS
-                    char           s_clean;
-                    char           s_fill[371];
-                    long           s_magic;
-                    long           s_type;
-        SystemV FS
-                    long           s_fill[12 or 14];
-                    long           s_state;
-                    long           s_magic;
-                    long           s_type;
-        Coherent FS
-                    unsigned long  s_unique;
-    Note that Coherent FS has no magic.
-
-* Inode layout:
-  - Minix FS
-                    unsigned short i_mode;
-                    unsigned short i_uid;
-                    unsigned long  i_size;
-                    unsigned long  i_time;
-                    unsigned char  i_gid;
-                    unsigned char  i_nlinks;
-                    unsigned short i_zone[7+1+1];
-  - Xenix FS, SystemV FS, Coherent FS
-                    unsigned short i_mode;
-                    unsigned short i_nlink;
-                    unsigned short i_uid;
-                    unsigned short i_gid;
-                    unsigned long  i_size;
-                    unsigned char  i_zone[3*(10+1+1+1)];
-                    unsigned long  i_atime;
-                    unsigned long  i_mtime;
-                    unsigned long  i_ctime;
-
-* Regular file data blocks are organized as
-  - Minix FS
-               7 direct blocks
-               1 indirect block (pointers to blocks)
-               1 double-indirect block (pointer to pointers to blocks)
-  - Xenix FS, SystemV FS, Coherent FS
-              10 direct blocks
-               1 indirect block (pointers to blocks)
-               1 double-indirect block (pointer to pointers to blocks)
-               1 triple-indirect block (pointer to pointers to pointers to blocks)
-
-* Inode size, inodes per block
-  - Minix FS        32   32
-  - Xenix FS        64   16
-  - SystemV FS      64   16
-  - Coherent FS     64    8
-
-* Directory entry on disk
-  - Minix FS
-                    unsigned short inode;
-                    char name[14/30];
-  - Xenix FS, SystemV FS, Coherent FS
-                    unsigned short inode;
-                    char name[14];
-
-* Dir entry size, dir entries per block
-  - Minix FS     16/32    64/32
-  - Xenix FS     16       64
-  - SystemV FS   16       64
-  - Coherent FS  16       32
-
-* How to implement symbolic links such that the host fsck doesn't scream:
-  - Minix FS     normal
-  - Xenix FS     kludge: as regular files with  chmod 1000
-  - SystemV FS   ??
-  - Coherent FS  kludge: as regular files with  chmod 1000
-
-
-Notation: We often speak of a "block" but mean a zone (the allocation unit)
-and not the disk driver's notion of "block".
-- 
cgit 


From 7e7cd458b8105b02e69e3af2ef4cd186326d7f84 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:26 +0100
Subject: docs: filesystems: convert tmpfs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Use :field: markup;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/30397a47a78ca59760fbc0fc5f50c5f1002d487a.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/tmpfs.rst | 163 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/tmpfs.txt | 149 --------------------------------
 3 files changed, 164 insertions(+), 149 deletions(-)
 create mode 100644 Documentation/filesystems/tmpfs.rst
 delete mode 100644 Documentation/filesystems/tmpfs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index d583b8b35196..27d37e7712da 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -89,5 +89,6 @@ Documentation for filesystem implementations.
    squashfs
    sysfs
    sysv-fs
+   tmpfs
    virtiofs
    vfat
diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst
new file mode 100644
index 000000000000..4e95929301a5
--- /dev/null
+++ b/Documentation/filesystems/tmpfs.rst
@@ -0,0 +1,163 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====
+Tmpfs
+=====
+
+Tmpfs is a file system which keeps all files in virtual memory.
+
+
+Everything in tmpfs is temporary in the sense that no files will be
+created on your hard drive. If you unmount a tmpfs instance,
+everything stored therein is lost.
+
+tmpfs puts everything into the kernel internal caches and grows and
+shrinks to accommodate the files it contains and is able to swap
+unneeded pages out to swap space. It has maximum size limits which can
+be adjusted on the fly via 'mount -o remount ...'
+
+If you compare it to ramfs (which was the template to create tmpfs)
+you gain swapping and limit checking. Another similar thing is the RAM
+disk (/dev/ram*), which simulates a fixed size hard disk in physical
+RAM, where you have to create an ordinary filesystem on top. Ramdisks
+cannot swap and you do not have the possibility to resize them.
+
+Since tmpfs lives completely in the page cache and on swap, all tmpfs
+pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
+free(1). Notice that these counters also include shared memory
+(shmem, see ipcs(1)). The most reliable way to get the count is
+using df(1) and du(1).
+
+tmpfs has the following uses:
+
+1) There is always a kernel internal mount which you will not see at
+   all. This is used for shared anonymous mappings and SYSV shared
+   memory.
+
+   This mount does not depend on CONFIG_TMPFS. If CONFIG_TMPFS is not
+   set, the user visible part of tmpfs is not build. But the internal
+   mechanisms are always present.
+
+2) glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for
+   POSIX shared memory (shm_open, shm_unlink). Adding the following
+   line to /etc/fstab should take care of this::
+
+	tmpfs	/dev/shm	tmpfs	defaults	0 0
+
+   Remember to create the directory that you intend to mount tmpfs on
+   if necessary.
+
+   This mount is _not_ needed for SYSV shared memory. The internal
+   mount is used for that. (In the 2.3 kernel versions it was
+   necessary to mount the predecessor of tmpfs (shm fs) to use SYSV
+   shared memory)
+
+3) Some people (including me) find it very convenient to mount it
+   e.g. on /tmp and /var/tmp and have a big swap partition. And now
+   loop mounts of tmpfs files do work, so mkinitrd shipped by most
+   distributions should succeed with a tmpfs /tmp.
+
+4) And probably a lot more I do not know about :-)
+
+
+tmpfs has three mount options for sizing:
+
+=========  ============================================================
+size       The limit of allocated bytes for this tmpfs instance. The
+           default is half of your physical RAM without swap. If you
+           oversize your tmpfs instances the machine will deadlock
+           since the OOM handler will not be able to free that memory.
+nr_blocks  The same as size, but in blocks of PAGE_SIZE.
+nr_inodes  The maximum number of inodes for this instance. The default
+           is half of the number of your physical RAM pages, or (on a
+           machine with highmem) the number of lowmem RAM pages,
+           whichever is the lower.
+=========  ============================================================
+
+These parameters accept a suffix k, m or g for kilo, mega and giga and
+can be changed on remount.  The size parameter also accepts a suffix %
+to limit this tmpfs instance to that percentage of your physical RAM:
+the default, when neither size nor nr_blocks is specified, is size=50%
+
+If nr_blocks=0 (or size=0), blocks will not be limited in that instance;
+if nr_inodes=0, inodes will not be limited.  It is generally unwise to
+mount with such options, since it allows any user with write access to
+use up all the memory on the machine; but enhances the scalability of
+that instance in a system with many cpus making intensive use of it.
+
+
+tmpfs has a mount option to set the NUMA memory allocation policy for
+all files in that instance (if CONFIG_NUMA is enabled) - which can be
+adjusted on the fly via 'mount -o remount ...'
+
+======================== ==============================================
+mpol=default             use the process allocation policy
+                         (see set_mempolicy(2))
+mpol=prefer:Node         prefers to allocate memory from the given Node
+mpol=bind:NodeList       allocates memory only from nodes in NodeList
+mpol=interleave          prefers to allocate from each node in turn
+mpol=interleave:NodeList allocates from each node of NodeList in turn
+mpol=local		 prefers to allocate memory from the local node
+======================== ==============================================
+
+NodeList format is a comma-separated list of decimal numbers and ranges,
+a range being two hyphen-separated decimal numbers, the smallest and
+largest node numbers in the range.  For example, mpol=bind:0-3,5,7,9-15
+
+A memory policy with a valid NodeList will be saved, as specified, for
+use at file creation time.  When a task allocates a file in the file
+system, the mount option memory policy will be applied with a NodeList,
+if any, modified by the calling task's cpuset constraints
+[See Documentation/admin-guide/cgroup-v1/cpusets.rst] and any optional flags,
+listed below.  If the resulting NodeLists is the empty set, the effective
+memory policy for the file will revert to "default" policy.
+
+NUMA memory allocation policies have optional flags that can be used in
+conjunction with their modes.  These optional flags can be specified
+when tmpfs is mounted by appending them to the mode before the NodeList.
+See Documentation/admin-guide/mm/numa_memory_policy.rst for a list of
+all available memory allocation policy mode flags and their effect on
+memory policy.
+
+::
+
+	=static		is equivalent to	MPOL_F_STATIC_NODES
+	=relative	is equivalent to	MPOL_F_RELATIVE_NODES
+
+For example, mpol=bind=static:NodeList, is the equivalent of an
+allocation policy of MPOL_BIND | MPOL_F_STATIC_NODES.
+
+Note that trying to mount a tmpfs with an mpol option will fail if the
+running kernel does not support NUMA; and will fail if its nodelist
+specifies a node which is not online.  If your system relies on that
+tmpfs being mounted, but from time to time runs a kernel built without
+NUMA capability (perhaps a safe recovery kernel), or with fewer nodes
+online, then it is advisable to omit the mpol option from automatic
+mount options.  It can be added later, when the tmpfs is already mounted
+on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'.
+
+
+To specify the initial root directory you can use the following mount
+options:
+
+====	==================================
+mode	The permissions as an octal number
+uid	The user id
+gid	The group id
+====	==================================
+
+These options do not have any effect on remount. You can change these
+parameters with chmod(1), chown(1) and chgrp(1) on a mounted filesystem.
+
+
+So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs'
+will give you tmpfs instance on /mytmpfs which can allocate 10GB
+RAM/SWAP in 10240 inodes and it is only accessible by root.
+
+
+:Author:
+   Christoph Rohland <cr@sap.com>, 1.12.01
+:Updated:
+   Hugh Dickins, 4 June 2007
+:Updated:
+   KOSAKI Motohiro, 16 Mar 2010
diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt
deleted file mode 100644
index 5ecbc03e6b2f..000000000000
--- a/Documentation/filesystems/tmpfs.txt
+++ /dev/null
@@ -1,149 +0,0 @@
-Tmpfs is a file system which keeps all files in virtual memory.
-
-
-Everything in tmpfs is temporary in the sense that no files will be
-created on your hard drive. If you unmount a tmpfs instance,
-everything stored therein is lost.
-
-tmpfs puts everything into the kernel internal caches and grows and
-shrinks to accommodate the files it contains and is able to swap
-unneeded pages out to swap space. It has maximum size limits which can
-be adjusted on the fly via 'mount -o remount ...'
-
-If you compare it to ramfs (which was the template to create tmpfs)
-you gain swapping and limit checking. Another similar thing is the RAM
-disk (/dev/ram*), which simulates a fixed size hard disk in physical
-RAM, where you have to create an ordinary filesystem on top. Ramdisks
-cannot swap and you do not have the possibility to resize them. 
-
-Since tmpfs lives completely in the page cache and on swap, all tmpfs
-pages will be shown as "Shmem" in /proc/meminfo and "Shared" in
-free(1). Notice that these counters also include shared memory
-(shmem, see ipcs(1)). The most reliable way to get the count is
-using df(1) and du(1).
-
-tmpfs has the following uses:
-
-1) There is always a kernel internal mount which you will not see at
-   all. This is used for shared anonymous mappings and SYSV shared
-   memory. 
-
-   This mount does not depend on CONFIG_TMPFS. If CONFIG_TMPFS is not
-   set, the user visible part of tmpfs is not build. But the internal
-   mechanisms are always present.
-
-2) glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for
-   POSIX shared memory (shm_open, shm_unlink). Adding the following
-   line to /etc/fstab should take care of this:
-
-	tmpfs	/dev/shm	tmpfs	defaults	0 0
-
-   Remember to create the directory that you intend to mount tmpfs on
-   if necessary.
-
-   This mount is _not_ needed for SYSV shared memory. The internal
-   mount is used for that. (In the 2.3 kernel versions it was
-   necessary to mount the predecessor of tmpfs (shm fs) to use SYSV
-   shared memory)
-
-3) Some people (including me) find it very convenient to mount it
-   e.g. on /tmp and /var/tmp and have a big swap partition. And now
-   loop mounts of tmpfs files do work, so mkinitrd shipped by most
-   distributions should succeed with a tmpfs /tmp.
-
-4) And probably a lot more I do not know about :-)
-
-
-tmpfs has three mount options for sizing:
-
-size:      The limit of allocated bytes for this tmpfs instance. The 
-           default is half of your physical RAM without swap. If you
-           oversize your tmpfs instances the machine will deadlock
-           since the OOM handler will not be able to free that memory.
-nr_blocks: The same as size, but in blocks of PAGE_SIZE.
-nr_inodes: The maximum number of inodes for this instance. The default
-           is half of the number of your physical RAM pages, or (on a
-           machine with highmem) the number of lowmem RAM pages,
-           whichever is the lower.
-
-These parameters accept a suffix k, m or g for kilo, mega and giga and
-can be changed on remount.  The size parameter also accepts a suffix %
-to limit this tmpfs instance to that percentage of your physical RAM:
-the default, when neither size nor nr_blocks is specified, is size=50%
-
-If nr_blocks=0 (or size=0), blocks will not be limited in that instance;
-if nr_inodes=0, inodes will not be limited.  It is generally unwise to
-mount with such options, since it allows any user with write access to
-use up all the memory on the machine; but enhances the scalability of
-that instance in a system with many cpus making intensive use of it.
-
-
-tmpfs has a mount option to set the NUMA memory allocation policy for
-all files in that instance (if CONFIG_NUMA is enabled) - which can be
-adjusted on the fly via 'mount -o remount ...'
-
-mpol=default             use the process allocation policy
-                         (see set_mempolicy(2))
-mpol=prefer:Node         prefers to allocate memory from the given Node
-mpol=bind:NodeList       allocates memory only from nodes in NodeList
-mpol=interleave          prefers to allocate from each node in turn
-mpol=interleave:NodeList allocates from each node of NodeList in turn
-mpol=local		 prefers to allocate memory from the local node
-
-NodeList format is a comma-separated list of decimal numbers and ranges,
-a range being two hyphen-separated decimal numbers, the smallest and
-largest node numbers in the range.  For example, mpol=bind:0-3,5,7,9-15
-
-A memory policy with a valid NodeList will be saved, as specified, for
-use at file creation time.  When a task allocates a file in the file
-system, the mount option memory policy will be applied with a NodeList,
-if any, modified by the calling task's cpuset constraints
-[See Documentation/admin-guide/cgroup-v1/cpusets.rst] and any optional flags, listed
-below.  If the resulting NodeLists is the empty set, the effective memory
-policy for the file will revert to "default" policy.
-
-NUMA memory allocation policies have optional flags that can be used in
-conjunction with their modes.  These optional flags can be specified
-when tmpfs is mounted by appending them to the mode before the NodeList.
-See Documentation/admin-guide/mm/numa_memory_policy.rst for a list of
-all available memory allocation policy mode flags and their effect on
-memory policy.
-
-	=static		is equivalent to	MPOL_F_STATIC_NODES
-	=relative	is equivalent to	MPOL_F_RELATIVE_NODES
-
-For example, mpol=bind=static:NodeList, is the equivalent of an
-allocation policy of MPOL_BIND | MPOL_F_STATIC_NODES.
-
-Note that trying to mount a tmpfs with an mpol option will fail if the
-running kernel does not support NUMA; and will fail if its nodelist
-specifies a node which is not online.  If your system relies on that
-tmpfs being mounted, but from time to time runs a kernel built without
-NUMA capability (perhaps a safe recovery kernel), or with fewer nodes
-online, then it is advisable to omit the mpol option from automatic
-mount options.  It can be added later, when the tmpfs is already mounted
-on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'.
-
-
-To specify the initial root directory you can use the following mount
-options:
-
-mode:	The permissions as an octal number
-uid:	The user id 
-gid:	The group id
-
-These options do not have any effect on remount. You can change these
-parameters with chmod(1), chown(1) and chgrp(1) on a mounted filesystem.
-
-
-So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs'
-will give you tmpfs instance on /mytmpfs which can allocate 10GB
-RAM/SWAP in 10240 inodes and it is only accessible by root.
-
-
-Author:
-   Christoph Rohland <cr@sap.com>, 1.12.01
-Updated:
-   Hugh Dickins, 4 June 2007
-Updated:
-   KOSAKI Motohiro, 16 Mar 2010
-- 
cgit 


From 688f118e3139f81f813ba1896931cf8fad93430d Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:27 +0100
Subject: docs: filesystems: convert ubifs-authentication.rst.txt to ReST

- Add a SPDX header;
- Mark some literals as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/0c36091b6660cd372f994bd98e1264491d766c22.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst                |  1 +
 Documentation/filesystems/ubifs-authentication.rst | 10 ++++++----
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 27d37e7712da..bb14738df358 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -90,5 +90,6 @@ Documentation for filesystem implementations.
    sysfs
    sysv-fs
    tmpfs
+   ubifs-authentication.rst
    virtiofs
    vfat
diff --git a/Documentation/filesystems/ubifs-authentication.rst b/Documentation/filesystems/ubifs-authentication.rst
index 6a9584f6ff46..16efd729bf7c 100644
--- a/Documentation/filesystems/ubifs-authentication.rst
+++ b/Documentation/filesystems/ubifs-authentication.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
 :orphan:
 
 .. UBIFS Authentication
@@ -92,11 +94,11 @@ UBIFS Index & Tree Node Cache
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Basic on-flash UBIFS entities are called *nodes*. UBIFS knows different types
-of nodes. Eg. data nodes (`struct ubifs_data_node`) which store chunks of file
-contents or inode nodes (`struct ubifs_ino_node`) which represent VFS inodes.
-Almost all types of nodes share a common header (`ubifs_ch`) containing basic
+of nodes. Eg. data nodes (``struct ubifs_data_node``) which store chunks of file
+contents or inode nodes (``struct ubifs_ino_node``) which represent VFS inodes.
+Almost all types of nodes share a common header (``ubifs_ch``) containing basic
 information like node type, node length, a sequence number, etc. (see
-`fs/ubifs/ubifs-media.h`in kernel source). Exceptions are entries of the LPT
+``fs/ubifs/ubifs-media.h`` in kernel source). Exceptions are entries of the LPT
 and some less important node types like padding nodes which are used to pad
 unusable content at the end of LEBs.
 
-- 
cgit 


From 38e56b4ec44139b5781d6ff13f1b422e4b38f0d4 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:28 +0100
Subject: docs: filesystems: convert ubifs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Adjust section titles;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add table markups;
- Add lists markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/9043dc2965cafc64e6a521e2317c00ecc8303bf6.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |   1 +
 Documentation/filesystems/ubifs.rst | 137 ++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/ubifs.txt | 126 ---------------------------------
 3 files changed, 138 insertions(+), 126 deletions(-)
 create mode 100644 Documentation/filesystems/ubifs.rst
 delete mode 100644 Documentation/filesystems/ubifs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index bb14738df358..58d57c9bf922 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -90,6 +90,7 @@ Documentation for filesystem implementations.
    sysfs
    sysv-fs
    tmpfs
+   ubifs
    ubifs-authentication.rst
    virtiofs
    vfat
diff --git a/Documentation/filesystems/ubifs.rst b/Documentation/filesystems/ubifs.rst
new file mode 100644
index 000000000000..e6ee99762534
--- /dev/null
+++ b/Documentation/filesystems/ubifs.rst
@@ -0,0 +1,137 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+UBI File System
+===============
+
+Introduction
+============
+
+UBIFS file-system stands for UBI File System. UBI stands for "Unsorted
+Block Images". UBIFS is a flash file system, which means it is designed
+to work with flash devices. It is important to understand, that UBIFS
+is completely different to any traditional file-system in Linux, like
+Ext2, XFS, JFS, etc. UBIFS represents a separate class of file-systems
+which work with MTD devices, not block devices. The other Linux
+file-system of this class is JFFS2.
+
+To make it more clear, here is a small comparison of MTD devices and
+block devices.
+
+1 MTD devices represent flash devices and they consist of eraseblocks of
+  rather large size, typically about 128KiB. Block devices consist of
+  small blocks, typically 512 bytes.
+2 MTD devices support 3 main operations - read from some offset within an
+  eraseblock, write to some offset within an eraseblock, and erase a whole
+  eraseblock. Block  devices support 2 main operations - read a whole
+  block and write a whole block.
+3 The whole eraseblock has to be erased before it becomes possible to
+  re-write its contents. Blocks may be just re-written.
+4 Eraseblocks become worn out after some number of erase cycles -
+  typically 100K-1G for SLC NAND and NOR flashes, and 1K-10K for MLC
+  NAND flashes. Blocks do not have the wear-out property.
+5 Eraseblocks may become bad (only on NAND flashes) and software should
+  deal with this. Blocks on hard drives typically do not become bad,
+  because hardware has mechanisms to substitute bad blocks, at least in
+  modern LBA disks.
+
+It should be quite obvious why UBIFS is very different to traditional
+file-systems.
+
+UBIFS works on top of UBI. UBI is a separate software layer which may be
+found in drivers/mtd/ubi. UBI is basically a volume management and
+wear-leveling layer. It provides so called UBI volumes which is a higher
+level abstraction than a MTD device. The programming model of UBI devices
+is very similar to MTD devices - they still consist of large eraseblocks,
+they have read/write/erase operations, but UBI devices are devoid of
+limitations like wear and bad blocks (items 4 and 5 in the above list).
+
+In a sense, UBIFS is a next generation of JFFS2 file-system, but it is
+very different and incompatible to JFFS2. The following are the main
+differences.
+
+* JFFS2 works on top of MTD devices, UBIFS depends on UBI and works on
+  top of UBI volumes.
+* JFFS2 does not have on-media index and has to build it while mounting,
+  which requires full media scan. UBIFS maintains the FS indexing
+  information on the flash media and does not require full media scan,
+  so it mounts many times faster than JFFS2.
+* JFFS2 is a write-through file-system, while UBIFS supports write-back,
+  which makes UBIFS much faster on writes.
+
+Similarly to JFFS2, UBIFS supports on-the-flight compression which makes
+it possible to fit quite a lot of data to the flash.
+
+Similarly to JFFS2, UBIFS is tolerant of unclean reboots and power-cuts.
+It does not need stuff like fsck.ext2. UBIFS automatically replays its
+journal and recovers from crashes, ensuring that the on-flash data
+structures are consistent.
+
+UBIFS scales logarithmically (most of the data structures it uses are
+trees), so the mount time and memory consumption do not linearly depend
+on the flash size, like in case of JFFS2. This is because UBIFS
+maintains the FS index on the flash media. However, UBIFS depends on
+UBI, which scales linearly. So overall UBI/UBIFS stack scales linearly.
+Nevertheless, UBI/UBIFS scales considerably better than JFFS2.
+
+The authors of UBIFS believe, that it is possible to develop UBI2 which
+would scale logarithmically as well. UBI2 would support the same API as UBI,
+but it would be binary incompatible to UBI. So UBIFS would not need to be
+changed to use UBI2
+
+
+Mount options
+=============
+
+(*) == default.
+
+====================	=======================================================
+bulk_read		read more in one go to take advantage of flash
+			media that read faster sequentially
+no_bulk_read (*)	do not bulk-read
+no_chk_data_crc (*)	skip checking of CRCs on data nodes in order to
+			improve read performance. Use this option only
+			if the flash media is highly reliable. The effect
+			of this option is that corruption of the contents
+			of a file can go unnoticed.
+chk_data_crc		do not skip checking CRCs on data nodes
+compr=none              override default compressor and set it to "none"
+compr=lzo               override default compressor and set it to "lzo"
+compr=zlib              override default compressor and set it to "zlib"
+auth_key=		specify the key used for authenticating the filesystem.
+			Passing this option makes authentication mandatory.
+			The passed key must be present in the kernel keyring
+			and must be of type 'logon'
+auth_hash_name=		The hash algorithm used for authentication. Used for
+			both hashing and for creating HMACs. Typical values
+			include "sha256" or "sha512"
+====================	=======================================================
+
+
+Quick usage instructions
+========================
+
+The UBI volume to mount is specified using "ubiX_Y" or "ubiX:NAME" syntax,
+where "X" is UBI device number, "Y" is UBI volume number, and "NAME" is
+UBI volume name.
+
+Mount volume 0 on UBI device 0 to /mnt/ubifs::
+
+    $ mount -t ubifs ubi0_0 /mnt/ubifs
+
+Mount "rootfs" volume of UBI device 0 to /mnt/ubifs ("rootfs" is volume
+name)::
+
+    $ mount -t ubifs ubi0:rootfs /mnt/ubifs
+
+The following is an example of the kernel boot arguments to attach mtd0
+to UBI and mount volume "rootfs":
+ubi.mtd=0 root=ubi0:rootfs rootfstype=ubifs
+
+References
+==========
+
+UBIFS documentation and FAQ/HOWTO at the MTD web site:
+
+- http://www.linux-mtd.infradead.org/doc/ubifs.html
+- http://www.linux-mtd.infradead.org/faq/ubifs.html
diff --git a/Documentation/filesystems/ubifs.txt b/Documentation/filesystems/ubifs.txt
deleted file mode 100644
index acc80442a3bb..000000000000
--- a/Documentation/filesystems/ubifs.txt
+++ /dev/null
@@ -1,126 +0,0 @@
-Introduction
-=============
-
-UBIFS file-system stands for UBI File System. UBI stands for "Unsorted
-Block Images". UBIFS is a flash file system, which means it is designed
-to work with flash devices. It is important to understand, that UBIFS
-is completely different to any traditional file-system in Linux, like
-Ext2, XFS, JFS, etc. UBIFS represents a separate class of file-systems
-which work with MTD devices, not block devices. The other Linux
-file-system of this class is JFFS2.
-
-To make it more clear, here is a small comparison of MTD devices and
-block devices.
-
-1 MTD devices represent flash devices and they consist of eraseblocks of
-  rather large size, typically about 128KiB. Block devices consist of
-  small blocks, typically 512 bytes.
-2 MTD devices support 3 main operations - read from some offset within an
-  eraseblock, write to some offset within an eraseblock, and erase a whole
-  eraseblock. Block  devices support 2 main operations - read a whole
-  block and write a whole block.
-3 The whole eraseblock has to be erased before it becomes possible to
-  re-write its contents. Blocks may be just re-written.
-4 Eraseblocks become worn out after some number of erase cycles -
-  typically 100K-1G for SLC NAND and NOR flashes, and 1K-10K for MLC
-  NAND flashes. Blocks do not have the wear-out property.
-5 Eraseblocks may become bad (only on NAND flashes) and software should
-  deal with this. Blocks on hard drives typically do not become bad,
-  because hardware has mechanisms to substitute bad blocks, at least in
-  modern LBA disks.
-
-It should be quite obvious why UBIFS is very different to traditional
-file-systems.
-
-UBIFS works on top of UBI. UBI is a separate software layer which may be
-found in drivers/mtd/ubi. UBI is basically a volume management and
-wear-leveling layer. It provides so called UBI volumes which is a higher
-level abstraction than a MTD device. The programming model of UBI devices
-is very similar to MTD devices - they still consist of large eraseblocks,
-they have read/write/erase operations, but UBI devices are devoid of
-limitations like wear and bad blocks (items 4 and 5 in the above list).
-
-In a sense, UBIFS is a next generation of JFFS2 file-system, but it is
-very different and incompatible to JFFS2. The following are the main
-differences.
-
-* JFFS2 works on top of MTD devices, UBIFS depends on UBI and works on
-  top of UBI volumes.
-* JFFS2 does not have on-media index and has to build it while mounting,
-  which requires full media scan. UBIFS maintains the FS indexing
-  information on the flash media and does not require full media scan,
-  so it mounts many times faster than JFFS2.
-* JFFS2 is a write-through file-system, while UBIFS supports write-back,
-  which makes UBIFS much faster on writes.
-
-Similarly to JFFS2, UBIFS supports on-the-flight compression which makes
-it possible to fit quite a lot of data to the flash.
-
-Similarly to JFFS2, UBIFS is tolerant of unclean reboots and power-cuts.
-It does not need stuff like fsck.ext2. UBIFS automatically replays its
-journal and recovers from crashes, ensuring that the on-flash data
-structures are consistent.
-
-UBIFS scales logarithmically (most of the data structures it uses are
-trees), so the mount time and memory consumption do not linearly depend
-on the flash size, like in case of JFFS2. This is because UBIFS
-maintains the FS index on the flash media. However, UBIFS depends on
-UBI, which scales linearly. So overall UBI/UBIFS stack scales linearly.
-Nevertheless, UBI/UBIFS scales considerably better than JFFS2.
-
-The authors of UBIFS believe, that it is possible to develop UBI2 which
-would scale logarithmically as well. UBI2 would support the same API as UBI,
-but it would be binary incompatible to UBI. So UBIFS would not need to be
-changed to use UBI2
-
-
-Mount options
-=============
-
-(*) == default.
-
-bulk_read		read more in one go to take advantage of flash
-			media that read faster sequentially
-no_bulk_read (*)	do not bulk-read
-no_chk_data_crc (*)	skip checking of CRCs on data nodes in order to
-			improve read performance. Use this option only
-			if the flash media is highly reliable. The effect
-			of this option is that corruption of the contents
-			of a file can go unnoticed.
-chk_data_crc		do not skip checking CRCs on data nodes
-compr=none              override default compressor and set it to "none"
-compr=lzo               override default compressor and set it to "lzo"
-compr=zlib              override default compressor and set it to "zlib"
-auth_key=		specify the key used for authenticating the filesystem.
-			Passing this option makes authentication mandatory.
-			The passed key must be present in the kernel keyring
-			and must be of type 'logon'
-auth_hash_name=		The hash algorithm used for authentication. Used for
-			both hashing and for creating HMACs. Typical values
-			include "sha256" or "sha512"
-
-
-Quick usage instructions
-========================
-
-The UBI volume to mount is specified using "ubiX_Y" or "ubiX:NAME" syntax,
-where "X" is UBI device number, "Y" is UBI volume number, and "NAME" is
-UBI volume name.
-
-Mount volume 0 on UBI device 0 to /mnt/ubifs:
-$ mount -t ubifs ubi0_0 /mnt/ubifs
-
-Mount "rootfs" volume of UBI device 0 to /mnt/ubifs ("rootfs" is volume
-name):
-$ mount -t ubifs ubi0:rootfs /mnt/ubifs
-
-The following is an example of the kernel boot arguments to attach mtd0
-to UBI and mount volume "rootfs":
-ubi.mtd=0 root=ubi0:rootfs rootfstype=ubifs
-
-References
-==========
-
-UBIFS documentation and FAQ/HOWTO at the MTD web site:
-http://www.linux-mtd.infradead.org/doc/ubifs.html
-http://www.linux-mtd.infradead.org/faq/ubifs.html
-- 
cgit 


From c9817ad5d82f04fbc66278eda27bff094dcb3119 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:29 +0100
Subject: docs: filesystems: convert udf.txt to ReST

- Add a SPDX header;
- Add a document title;
- Add table markups;
- Add lists markups;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/2887f8a3a813a31170389eab687e9f199327dc7d.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst |  1 +
 Documentation/filesystems/udf.rst   | 75 +++++++++++++++++++++++++++++++++++++
 Documentation/filesystems/udf.txt   | 66 --------------------------------
 3 files changed, 76 insertions(+), 66 deletions(-)
 create mode 100644 Documentation/filesystems/udf.rst
 delete mode 100644 Documentation/filesystems/udf.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index 58d57c9bf922..ec03cb4d7353 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -92,5 +92,6 @@ Documentation for filesystem implementations.
    tmpfs
    ubifs
    ubifs-authentication.rst
+   udf
    virtiofs
    vfat
diff --git a/Documentation/filesystems/udf.rst b/Documentation/filesystems/udf.rst
new file mode 100644
index 000000000000..d9badbf285b2
--- /dev/null
+++ b/Documentation/filesystems/udf.rst
@@ -0,0 +1,75 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+UDF file system
+===============
+
+If you encounter problems with reading UDF discs using this driver,
+please report them according to MAINTAINERS file.
+
+Write support requires a block driver which supports writing.  Currently
+dvd+rw drives and media support true random sector writes, and so a udf
+filesystem on such devices can be directly mounted read/write.  CD-RW
+media however, does not support this.  Instead the media can be formatted
+for packet mode using the utility cdrwtool, then the pktcdvd driver can
+be bound to the underlying cd device to provide the required buffering
+and read-modify-write cycles to allow the filesystem random sector writes
+while providing the hardware with only full packet writes.  While not
+required for dvd+rw media, use of the pktcdvd driver often enhances
+performance due to very poor read-modify-write support supplied internally
+by drive firmware.
+
+-------------------------------------------------------------------------------
+
+The following mount options are supported:
+
+	===========	======================================
+	gid=		Set the default group.
+	umask=		Set the default umask.
+	mode=		Set the default file permissions.
+	dmode=		Set the default directory permissions.
+	uid=		Set the default user.
+	bs=		Set the block size.
+	unhide		Show otherwise hidden files.
+	undelete	Show deleted files in lists.
+	adinicb		Embed data in the inode (default)
+	noadinicb	Don't embed data in the inode
+	shortad		Use short ad's
+	longad		Use long ad's (default)
+	nostrict	Unset strict conformance
+	iocharset=	Set the NLS character set
+	===========	======================================
+
+The uid= and gid= options need a bit more explaining.  They will accept a
+decimal numeric value and all inodes on that mount will then appear as
+belonging to that uid and gid.  Mount options also accept the string "forget".
+The forget option causes all IDs to be written to disk as -1 which is a way
+of UDF standard to indicate that IDs are not supported for these files .
+
+For typical desktop use of removable media, you should set the ID to that of
+the interactively logged on user, and also specify the forget option.  This way
+the interactive user will always see the files on the disk as belonging to him.
+
+The remaining are for debugging and disaster recovery:
+
+	=====		================================
+	novrs		Skip volume sequence recognition
+	=====		================================
+
+The following expect a offset from 0.
+
+	==========	=================================================
+	session=	Set the CDROM session (default= last session)
+	anchor=		Override standard anchor location. (default= 256)
+	lastblock=	Set the last block of the filesystem/
+	==========	=================================================
+
+-------------------------------------------------------------------------------
+
+
+For the latest version and toolset see:
+	https://github.com/pali/udftools
+
+Documentation on UDF and ECMA 167 is available FREE from:
+	- http://www.osta.org/
+	- http://www.ecma-international.org/
diff --git a/Documentation/filesystems/udf.txt b/Documentation/filesystems/udf.txt
deleted file mode 100644
index e2f2faf32f18..000000000000
--- a/Documentation/filesystems/udf.txt
+++ /dev/null
@@ -1,66 +0,0 @@
-*
-* Documentation/filesystems/udf.txt
-*
-
-If you encounter problems with reading UDF discs using this driver,
-please report them according to MAINTAINERS file.
-
-Write support requires a block driver which supports writing.  Currently
-dvd+rw drives and media support true random sector writes, and so a udf
-filesystem on such devices can be directly mounted read/write.  CD-RW
-media however, does not support this.  Instead the media can be formatted
-for packet mode using the utility cdrwtool, then the pktcdvd driver can
-be bound to the underlying cd device to provide the required buffering
-and read-modify-write cycles to allow the filesystem random sector writes
-while providing the hardware with only full packet writes.  While not
-required for dvd+rw media, use of the pktcdvd driver often enhances
-performance due to very poor read-modify-write support supplied internally
-by drive firmware.
-
--------------------------------------------------------------------------------
-The following mount options are supported:
-
-	gid=		Set the default group.
-	umask=		Set the default umask.
-	mode=		Set the default file permissions.
-	dmode=		Set the default directory permissions.
-	uid=		Set the default user.
-	bs=		Set the block size.
-	unhide		Show otherwise hidden files.
-	undelete	Show deleted files in lists.
-	adinicb		Embed data in the inode (default)
-	noadinicb	Don't embed data in the inode
-	shortad		Use short ad's
-	longad		Use long ad's (default)
-	nostrict	Unset strict conformance
-	iocharset=	Set the NLS character set
-
-The uid= and gid= options need a bit more explaining.  They will accept a
-decimal numeric value and all inodes on that mount will then appear as
-belonging to that uid and gid.  Mount options also accept the string "forget".
-The forget option causes all IDs to be written to disk as -1 which is a way
-of UDF standard to indicate that IDs are not supported for these files .
-
-For typical desktop use of removable media, you should set the ID to that of
-the interactively logged on user, and also specify the forget option.  This way
-the interactive user will always see the files on the disk as belonging to him.
-
-The remaining are for debugging and disaster recovery:
-
-	novrs		Skip volume sequence recognition 
-
-The following expect a offset from 0.
-
-	session=	Set the CDROM session (default= last session)
-	anchor=		Override standard anchor location. (default= 256)
-	lastblock=	Set the last block of the filesystem/
-
--------------------------------------------------------------------------------
-
-
-For the latest version and toolset see:
-	https://github.com/pali/udftools
-
-Documentation on UDF and ECMA 167 is available FREE from:
-	http://www.osta.org/
-	http://www.ecma-international.org/
-- 
cgit 


From 9a6108124c1d27192fee6f058b5de84f51ab62a0 Mon Sep 17 00:00:00 2001
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date: Mon, 17 Feb 2020 17:12:30 +0100
Subject: docs: filesystems: convert zonefs.txt to ReST

- Add a SPDX header;
- Add a document title;
- Some whitespace fixes and new line breaks;
- Mark literal blocks as such;
- Add it to filesystems/index.rst.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Link: https://lore.kernel.org/r/42a7cfcd19f6b904a9a3188fd4af71bed5050052.1581955849.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/filesystems/index.rst  |   1 +
 Documentation/filesystems/zonefs.rst | 412 +++++++++++++++++++++++++++++++++++
 Documentation/filesystems/zonefs.txt | 404 ----------------------------------
 3 files changed, 413 insertions(+), 404 deletions(-)
 create mode 100644 Documentation/filesystems/zonefs.rst
 delete mode 100644 Documentation/filesystems/zonefs.txt

diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index ec03cb4d7353..53f46a88e6ec 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -95,3 +95,4 @@ Documentation for filesystem implementations.
    udf
    virtiofs
    vfat
+   zonefs
diff --git a/Documentation/filesystems/zonefs.rst b/Documentation/filesystems/zonefs.rst
new file mode 100644
index 000000000000..7e733e751e98
--- /dev/null
+++ b/Documentation/filesystems/zonefs.rst
@@ -0,0 +1,412 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================================================
+ZoneFS - Zone filesystem for Zoned block devices
+================================================
+
+Introduction
+============
+
+zonefs is a very simple file system exposing each zone of a zoned block device
+as a file. Unlike a regular POSIX-compliant file system with native zoned block
+device support (e.g. f2fs), zonefs does not hide the sequential write
+constraint of zoned block devices to the user. Files representing sequential
+write zones of the device must be written sequentially starting from the end
+of the file (append only writes).
+
+As such, zonefs is in essence closer to a raw block device access interface
+than to a full-featured POSIX file system. The goal of zonefs is to simplify
+the implementation of zoned block device support in applications by replacing
+raw block device file accesses with a richer file API, avoiding relying on
+direct block device file ioctls which may be more obscure to developers. One
+example of this approach is the implementation of LSM (log-structured merge)
+tree structures (such as used in RocksDB and LevelDB) on zoned block devices
+by allowing SSTables to be stored in a zone file similarly to a regular file
+system rather than as a range of sectors of the entire disk. The introduction
+of the higher level construct "one file is one zone" can help reducing the
+amount of changes needed in the application as well as introducing support for
+different application programming languages.
+
+Zoned block devices
+-------------------
+
+Zoned storage devices belong to a class of storage devices with an address
+space that is divided into zones. A zone is a group of consecutive LBAs and all
+zones are contiguous (there are no LBA gaps). Zones may have different types.
+
+* Conventional zones: there are no access constraints to LBAs belonging to
+  conventional zones. Any read or write access can be executed, similarly to a
+  regular block device.
+* Sequential zones: these zones accept random reads but must be written
+  sequentially. Each sequential zone has a write pointer maintained by the
+  device that keeps track of the mandatory start LBA position of the next write
+  to the device. As a result of this write constraint, LBAs in a sequential zone
+  cannot be overwritten. Sequential zones must first be erased using a special
+  command (zone reset) before rewriting.
+
+Zoned storage devices can be implemented using various recording and media
+technologies. The most common form of zoned storage today uses the SCSI Zoned
+Block Commands (ZBC) and Zoned ATA Commands (ZAC) interfaces on Shingled
+Magnetic Recording (SMR) HDDs.
+
+Solid State Disks (SSD) storage devices can also implement a zoned interface
+to, for instance, reduce internal write amplification due to garbage collection.
+The NVMe Zoned NameSpace (ZNS) is a technical proposal of the NVMe standard
+committee aiming at adding a zoned storage interface to the NVMe protocol.
+
+Zonefs Overview
+===============
+
+Zonefs exposes the zones of a zoned block device as files. The files
+representing zones are grouped by zone type, which are themselves represented
+by sub-directories. This file structure is built entirely using zone information
+provided by the device and so does not require any complex on-disk metadata
+structure.
+
+On-disk metadata
+----------------
+
+zonefs on-disk metadata is reduced to an immutable super block which
+persistently stores a magic number and optional feature flags and values. On
+mount, zonefs uses blkdev_report_zones() to obtain the device zone configuration
+and populates the mount point with a static file tree solely based on this
+information. File sizes come from the device zone type and write pointer
+position managed by the device itself.
+
+The super block is always written on disk at sector 0. The first zone of the
+device storing the super block is never exposed as a zone file by zonefs. If
+the zone containing the super block is a sequential zone, the mkzonefs format
+tool always "finishes" the zone, that is, it transitions the zone to a full
+state to make it read-only, preventing any data write.
+
+Zone type sub-directories
+-------------------------
+
+Files representing zones of the same type are grouped together under the same
+sub-directory automatically created on mount.
+
+For conventional zones, the sub-directory "cnv" is used. This directory is
+however created if and only if the device has usable conventional zones. If
+the device only has a single conventional zone at sector 0, the zone will not
+be exposed as a file as it will be used to store the zonefs super block. For
+such devices, the "cnv" sub-directory will not be created.
+
+For sequential write zones, the sub-directory "seq" is used.
+
+These two directories are the only directories that exist in zonefs. Users
+cannot create other directories and cannot rename nor delete the "cnv" and
+"seq" sub-directories.
+
+The size of the directories indicated by the st_size field of struct stat,
+obtained with the stat() or fstat() system calls, indicates the number of files
+existing under the directory.
+
+Zone files
+----------
+
+Zone files are named using the number of the zone they represent within the set
+of zones of a particular type. That is, both the "cnv" and "seq" directories
+contain files named "0", "1", "2", ... The file numbers also represent
+increasing zone start sector on the device.
+
+All read and write operations to zone files are not allowed beyond the file
+maximum size, that is, beyond the zone size. Any access exceeding the zone
+size is failed with the -EFBIG error.
+
+Creating, deleting, renaming or modifying any attribute of files and
+sub-directories is not allowed.
+
+The number of blocks of a file as reported by stat() and fstat() indicates the
+size of the file zone, or in other words, the maximum file size.
+
+Conventional zone files
+-----------------------
+
+The size of conventional zone files is fixed to the size of the zone they
+represent. Conventional zone files cannot be truncated.
+
+These files can be randomly read and written using any type of I/O operation:
+buffered I/Os, direct I/Os, memory mapped I/Os (mmap), etc. There are no I/O
+constraint for these files beyond the file size limit mentioned above.
+
+Sequential zone files
+---------------------
+
+The size of sequential zone files grouped in the "seq" sub-directory represents
+the file's zone write pointer position relative to the zone start sector.
+
+Sequential zone files can only be written sequentially, starting from the file
+end, that is, write operations can only be append writes. Zonefs makes no
+attempt at accepting random writes and will fail any write request that has a
+start offset not corresponding to the end of the file, or to the end of the last
+write issued and still in-flight (for asynchrnous I/O operations).
+
+Since dirty page writeback by the page cache does not guarantee a sequential
+write pattern, zonefs prevents buffered writes and writeable shared mappings
+on sequential files. Only direct I/O writes are accepted for these files.
+zonefs relies on the sequential delivery of write I/O requests to the device
+implemented by the block layer elevator. An elevator implementing the sequential
+write feature for zoned block device (ELEVATOR_F_ZBD_SEQ_WRITE elevator feature)
+must be used. This type of elevator (e.g. mq-deadline) is the set by default
+for zoned block devices on device initialization.
+
+There are no restrictions on the type of I/O used for read operations in
+sequential zone files. Buffered I/Os, direct I/Os and shared read mappings are
+all accepted.
+
+Truncating sequential zone files is allowed only down to 0, in which case, the
+zone is reset to rewind the file zone write pointer position to the start of
+the zone, or up to the zone size, in which case the file's zone is transitioned
+to the FULL state (finish zone operation).
+
+Format options
+--------------
+
+Several optional features of zonefs can be enabled at format time.
+
+* Conventional zone aggregation: ranges of contiguous conventional zones can be
+  aggregated into a single larger file instead of the default one file per zone.
+* File ownership: The owner UID and GID of zone files is by default 0 (root)
+  but can be changed to any valid UID/GID.
+* File access permissions: the default 640 access permissions can be changed.
+
+IO error handling
+-----------------
+
+Zoned block devices may fail I/O requests for reasons similar to regular block
+devices, e.g. due to bad sectors. However, in addition to such known I/O
+failure pattern, the standards governing zoned block devices behavior define
+additional conditions that result in I/O errors.
+
+* A zone may transition to the read-only condition (BLK_ZONE_COND_READONLY):
+  While the data already written in the zone is still readable, the zone can
+  no longer be written. No user action on the zone (zone management command or
+  read/write access) can change the zone condition back to a normal read/write
+  state. While the reasons for the device to transition a zone to read-only
+  state are not defined by the standards, a typical cause for such transition
+  would be a defective write head on an HDD (all zones under this head are
+  changed to read-only).
+
+* A zone may transition to the offline condition (BLK_ZONE_COND_OFFLINE):
+  An offline zone cannot be read nor written. No user action can transition an
+  offline zone back to an operational good state. Similarly to zone read-only
+  transitions, the reasons for a drive to transition a zone to the offline
+  condition are undefined. A typical cause would be a defective read-write head
+  on an HDD causing all zones on the platter under the broken head to be
+  inaccessible.
+
+* Unaligned write errors: These errors result from the host issuing write
+  requests with a start sector that does not correspond to a zone write pointer
+  position when the write request is executed by the device. Even though zonefs
+  enforces sequential file write for sequential zones, unaligned write errors
+  may still happen in the case of a partial failure of a very large direct I/O
+  operation split into multiple BIOs/requests or asynchronous I/O operations.
+  If one of the write request within the set of sequential write requests
+  issued to the device fails, all write requests after queued after it will
+  become unaligned and fail.
+
+* Delayed write errors: similarly to regular block devices, if the device side
+  write cache is enabled, write errors may occur in ranges of previously
+  completed writes when the device write cache is flushed, e.g. on fsync().
+  Similarly to the previous immediate unaligned write error case, delayed write
+  errors can propagate through a stream of cached sequential data for a zone
+  causing all data to be dropped after the sector that caused the error.
+
+All I/O errors detected by zonefs are notified to the user with an error code
+return for the system call that trigered or detected the error. The recovery
+actions taken by zonefs in response to I/O errors depend on the I/O type (read
+vs write) and on the reason for the error (bad sector, unaligned writes or zone
+condition change).
+
+* For read I/O errors, zonefs does not execute any particular recovery action,
+  but only if the file zone is still in a good condition and there is no
+  inconsistency between the file inode size and its zone write pointer position.
+  If a problem is detected, I/O error recovery is executed (see below table).
+
+* For write I/O errors, zonefs I/O error recovery is always executed.
+
+* A zone condition change to read-only or offline also always triggers zonefs
+  I/O error recovery.
+
+Zonefs minimal I/O error recovery may change a file size and a file access
+permissions.
+
+* File size changes:
+  Immediate or delayed write errors in a sequential zone file may cause the file
+  inode size to be inconsistent with the amount of data successfully written in
+  the file zone. For instance, the partial failure of a multi-BIO large write
+  operation will cause the zone write pointer to advance partially, even though
+  the entire write operation will be reported as failed to the user. In such
+  case, the file inode size must be advanced to reflect the zone write pointer
+  change and eventually allow the user to restart writing at the end of the
+  file.
+  A file size may also be reduced to reflect a delayed write error detected on
+  fsync(): in this case, the amount of data effectively written in the zone may
+  be less than originally indicated by the file inode size. After such I/O
+  error, zonefs always fixes a file inode size to reflect the amount of data
+  persistently stored in the file zone.
+
+* Access permission changes:
+  A zone condition change to read-only is indicated with a change in the file
+  access permissions to render the file read-only. This disables changes to the
+  file attributes and data modification. For offline zones, all permissions
+  (read and write) to the file are disabled.
+
+Further action taken by zonefs I/O error recovery can be controlled by the user
+with the "errors=xxx" mount option. The table below summarizes the result of
+zonefs I/O error processing depending on the mount option and on the zone
+conditions::
+
+    +--------------+-----------+-----------------------------------------+
+    |              |           |            Post error state             |
+    | "errors=xxx" |  device   |                 access permissions      |
+    |    mount     |   zone    | file         file          device zone  |
+    |    option    | condition | size     read    write    read    write |
+    +--------------+-----------+-----------------------------------------+
+    |              | good      | fixed    yes     no       yes     yes   |
+    | remount-ro   | read-only | fixed    yes     no       yes     no    |
+    | (default)    | offline   |   0      no      no       no      no    |
+    +--------------+-----------+-----------------------------------------+
+    |              | good      | fixed    yes     no       yes     yes   |
+    | zone-ro      | read-only | fixed    yes     no       yes     no    |
+    |              | offline   |   0      no      no       no      no    |
+    +--------------+-----------+-----------------------------------------+
+    |              | good      |   0      no      no       yes     yes   |
+    | zone-offline | read-only |   0      no      no       yes     no    |
+    |              | offline   |   0      no      no       no      no    |
+    +--------------+-----------+-----------------------------------------+
+    |              | good      | fixed    yes     yes      yes     yes   |
+    | repair       | read-only | fixed    yes     no       yes     no    |
+    |              | offline   |   0      no      no       no      no    |
+    +--------------+-----------+-----------------------------------------+
+
+Further notes:
+
+* The "errors=remount-ro" mount option is the default behavior of zonefs I/O
+  error processing if no errors mount option is specified.
+* With the "errors=remount-ro" mount option, the change of the file access
+  permissions to read-only applies to all files. The file system is remounted
+  read-only.
+* Access permission and file size changes due to the device transitioning zones
+  to the offline condition are permanent. Remounting or reformating the device
+  with mkfs.zonefs (mkzonefs) will not change back offline zone files to a good
+  state.
+* File access permission changes to read-only due to the device transitioning
+  zones to the read-only condition are permanent. Remounting or reformating
+  the device will not re-enable file write access.
+* File access permission changes implied by the remount-ro, zone-ro and
+  zone-offline mount options are temporary for zones in a good condition.
+  Unmounting and remounting the file system will restore the previous default
+  (format time values) access rights to the files affected.
+* The repair mount option triggers only the minimal set of I/O error recovery
+  actions, that is, file size fixes for zones in a good condition. Zones
+  indicated as being read-only or offline by the device still imply changes to
+  the zone file access permissions as noted in the table above.
+
+Mount options
+-------------
+
+zonefs define the "errors=<behavior>" mount option to allow the user to specify
+zonefs behavior in response to I/O errors, inode size inconsistencies or zone
+condition chages. The defined behaviors are as follow:
+
+* remount-ro (default)
+* zone-ro
+* zone-offline
+* repair
+
+The I/O error actions defined for each behavior is detailed in the previous
+section.
+
+Zonefs User Space Tools
+=======================
+
+The mkzonefs tool is used to format zoned block devices for use with zonefs.
+This tool is available on Github at:
+
+https://github.com/damien-lemoal/zonefs-tools
+
+zonefs-tools also includes a test suite which can be run against any zoned
+block device, including null_blk block device created with zoned mode.
+
+Examples
+--------
+
+The following formats a 15TB host-managed SMR HDD with 256 MB zones
+with the conventional zones aggregation feature enabled::
+
+    # mkzonefs -o aggr_cnv /dev/sdX
+    # mount -t zonefs /dev/sdX /mnt
+    # ls -l /mnt/
+    total 0
+    dr-xr-xr-x 2 root root     1 Nov 25 13:23 cnv
+    dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq
+
+The size of the zone files sub-directories indicate the number of files
+existing for each type of zones. In this example, there is only one
+conventional zone file (all conventional zones are aggregated under a single
+file)::
+
+    # ls -l /mnt/cnv
+    total 137101312
+    -rw-r----- 1 root root 140391743488 Nov 25 13:23 0
+
+This aggregated conventional zone file can be used as a regular file::
+
+    # mkfs.ext4 /mnt/cnv/0
+    # mount -o loop /mnt/cnv/0 /data
+
+The "seq" sub-directory grouping files for sequential write zones has in this
+example 55356 zones::
+
+    # ls -lv /mnt/seq
+    total 14511243264
+    -rw-r----- 1 root root 0 Nov 25 13:23 0
+    -rw-r----- 1 root root 0 Nov 25 13:23 1
+    -rw-r----- 1 root root 0 Nov 25 13:23 2
+    ...
+    -rw-r----- 1 root root 0 Nov 25 13:23 55354
+    -rw-r----- 1 root root 0 Nov 25 13:23 55355
+
+For sequential write zone files, the file size changes as data is appended at
+the end of the file, similarly to any regular file system::
+
+    # dd if=/dev/zero of=/mnt/seq/0 bs=4096 count=1 conv=notrunc oflag=direct
+    1+0 records in
+    1+0 records out
+    4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00044121 s, 9.3 MB/s
+
+    # ls -l /mnt/seq/0
+    -rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0
+
+The written file can be truncated to the zone size, preventing any further
+write operation::
+
+    # truncate -s 268435456 /mnt/seq/0
+    # ls -l /mnt/seq/0
+    -rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0
+
+Truncation to 0 size allows freeing the file zone storage space and restart
+append-writes to the file::
+
+    # truncate -s 0 /mnt/seq/0
+    # ls -l /mnt/seq/0
+    -rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0
+
+Since files are statically mapped to zones on the disk, the number of blocks of
+a file as reported by stat() and fstat() indicates the size of the file zone::
+
+    # stat /mnt/seq/0
+    File: /mnt/seq/0
+    Size: 0         	Blocks: 524288     IO Block: 4096   regular empty file
+    Device: 870h/2160d	Inode: 50431       Links: 1
+    Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/    root)
+    Access: 2019-11-25 13:23:57.048971997 +0900
+    Modify: 2019-11-25 13:52:25.553805765 +0900
+    Change: 2019-11-25 13:52:25.553805765 +0900
+    Birth: -
+
+The number of blocks of the file ("Blocks") in units of 512B blocks gives the
+maximum file size of 524288 * 512 B = 256 MB, corresponding to the device zone
+size in this example. Of note is that the "IO block" field always indicates the
+minimum I/O size for writes and corresponds to the device physical sector size.
diff --git a/Documentation/filesystems/zonefs.txt b/Documentation/filesystems/zonefs.txt
deleted file mode 100644
index 935bf22031ca..000000000000
--- a/Documentation/filesystems/zonefs.txt
+++ /dev/null
@@ -1,404 +0,0 @@
-ZoneFS - Zone filesystem for Zoned block devices
-
-Introduction
-============
-
-zonefs is a very simple file system exposing each zone of a zoned block device
-as a file. Unlike a regular POSIX-compliant file system with native zoned block
-device support (e.g. f2fs), zonefs does not hide the sequential write
-constraint of zoned block devices to the user. Files representing sequential
-write zones of the device must be written sequentially starting from the end
-of the file (append only writes).
-
-As such, zonefs is in essence closer to a raw block device access interface
-than to a full-featured POSIX file system. The goal of zonefs is to simplify
-the implementation of zoned block device support in applications by replacing
-raw block device file accesses with a richer file API, avoiding relying on
-direct block device file ioctls which may be more obscure to developers. One
-example of this approach is the implementation of LSM (log-structured merge)
-tree structures (such as used in RocksDB and LevelDB) on zoned block devices
-by allowing SSTables to be stored in a zone file similarly to a regular file
-system rather than as a range of sectors of the entire disk. The introduction
-of the higher level construct "one file is one zone" can help reducing the
-amount of changes needed in the application as well as introducing support for
-different application programming languages.
-
-Zoned block devices
--------------------
-
-Zoned storage devices belong to a class of storage devices with an address
-space that is divided into zones. A zone is a group of consecutive LBAs and all
-zones are contiguous (there are no LBA gaps). Zones may have different types.
-* Conventional zones: there are no access constraints to LBAs belonging to
-  conventional zones. Any read or write access can be executed, similarly to a
-  regular block device.
-* Sequential zones: these zones accept random reads but must be written
-  sequentially. Each sequential zone has a write pointer maintained by the
-  device that keeps track of the mandatory start LBA position of the next write
-  to the device. As a result of this write constraint, LBAs in a sequential zone
-  cannot be overwritten. Sequential zones must first be erased using a special
-  command (zone reset) before rewriting.
-
-Zoned storage devices can be implemented using various recording and media
-technologies. The most common form of zoned storage today uses the SCSI Zoned
-Block Commands (ZBC) and Zoned ATA Commands (ZAC) interfaces on Shingled
-Magnetic Recording (SMR) HDDs.
-
-Solid State Disks (SSD) storage devices can also implement a zoned interface
-to, for instance, reduce internal write amplification due to garbage collection.
-The NVMe Zoned NameSpace (ZNS) is a technical proposal of the NVMe standard
-committee aiming at adding a zoned storage interface to the NVMe protocol.
-
-Zonefs Overview
-===============
-
-Zonefs exposes the zones of a zoned block device as files. The files
-representing zones are grouped by zone type, which are themselves represented
-by sub-directories. This file structure is built entirely using zone information
-provided by the device and so does not require any complex on-disk metadata
-structure.
-
-On-disk metadata
-----------------
-
-zonefs on-disk metadata is reduced to an immutable super block which
-persistently stores a magic number and optional feature flags and values. On
-mount, zonefs uses blkdev_report_zones() to obtain the device zone configuration
-and populates the mount point with a static file tree solely based on this
-information. File sizes come from the device zone type and write pointer
-position managed by the device itself.
-
-The super block is always written on disk at sector 0. The first zone of the
-device storing the super block is never exposed as a zone file by zonefs. If
-the zone containing the super block is a sequential zone, the mkzonefs format
-tool always "finishes" the zone, that is, it transitions the zone to a full
-state to make it read-only, preventing any data write.
-
-Zone type sub-directories
--------------------------
-
-Files representing zones of the same type are grouped together under the same
-sub-directory automatically created on mount.
-
-For conventional zones, the sub-directory "cnv" is used. This directory is
-however created if and only if the device has usable conventional zones. If
-the device only has a single conventional zone at sector 0, the zone will not
-be exposed as a file as it will be used to store the zonefs super block. For
-such devices, the "cnv" sub-directory will not be created.
-
-For sequential write zones, the sub-directory "seq" is used.
-
-These two directories are the only directories that exist in zonefs. Users
-cannot create other directories and cannot rename nor delete the "cnv" and
-"seq" sub-directories.
-
-The size of the directories indicated by the st_size field of struct stat,
-obtained with the stat() or fstat() system calls, indicates the number of files
-existing under the directory.
-
-Zone files
-----------
-
-Zone files are named using the number of the zone they represent within the set
-of zones of a particular type. That is, both the "cnv" and "seq" directories
-contain files named "0", "1", "2", ... The file numbers also represent
-increasing zone start sector on the device.
-
-All read and write operations to zone files are not allowed beyond the file
-maximum size, that is, beyond the zone size. Any access exceeding the zone
-size is failed with the -EFBIG error.
-
-Creating, deleting, renaming or modifying any attribute of files and
-sub-directories is not allowed.
-
-The number of blocks of a file as reported by stat() and fstat() indicates the
-size of the file zone, or in other words, the maximum file size.
-
-Conventional zone files
------------------------
-
-The size of conventional zone files is fixed to the size of the zone they
-represent. Conventional zone files cannot be truncated.
-
-These files can be randomly read and written using any type of I/O operation:
-buffered I/Os, direct I/Os, memory mapped I/Os (mmap), etc. There are no I/O
-constraint for these files beyond the file size limit mentioned above.
-
-Sequential zone files
----------------------
-
-The size of sequential zone files grouped in the "seq" sub-directory represents
-the file's zone write pointer position relative to the zone start sector.
-
-Sequential zone files can only be written sequentially, starting from the file
-end, that is, write operations can only be append writes. Zonefs makes no
-attempt at accepting random writes and will fail any write request that has a
-start offset not corresponding to the end of the file, or to the end of the last
-write issued and still in-flight (for asynchrnous I/O operations).
-
-Since dirty page writeback by the page cache does not guarantee a sequential
-write pattern, zonefs prevents buffered writes and writeable shared mappings
-on sequential files. Only direct I/O writes are accepted for these files.
-zonefs relies on the sequential delivery of write I/O requests to the device
-implemented by the block layer elevator. An elevator implementing the sequential
-write feature for zoned block device (ELEVATOR_F_ZBD_SEQ_WRITE elevator feature)
-must be used. This type of elevator (e.g. mq-deadline) is the set by default
-for zoned block devices on device initialization.
-
-There are no restrictions on the type of I/O used for read operations in
-sequential zone files. Buffered I/Os, direct I/Os and shared read mappings are
-all accepted.
-
-Truncating sequential zone files is allowed only down to 0, in which case, the
-zone is reset to rewind the file zone write pointer position to the start of
-the zone, or up to the zone size, in which case the file's zone is transitioned
-to the FULL state (finish zone operation).
-
-Format options
---------------
-
-Several optional features of zonefs can be enabled at format time.
-* Conventional zone aggregation: ranges of contiguous conventional zones can be
-  aggregated into a single larger file instead of the default one file per zone.
-* File ownership: The owner UID and GID of zone files is by default 0 (root)
-  but can be changed to any valid UID/GID.
-* File access permissions: the default 640 access permissions can be changed.
-
-IO error handling
------------------
-
-Zoned block devices may fail I/O requests for reasons similar to regular block
-devices, e.g. due to bad sectors. However, in addition to such known I/O
-failure pattern, the standards governing zoned block devices behavior define
-additional conditions that result in I/O errors.
-
-* A zone may transition to the read-only condition (BLK_ZONE_COND_READONLY):
-  While the data already written in the zone is still readable, the zone can
-  no longer be written. No user action on the zone (zone management command or
-  read/write access) can change the zone condition back to a normal read/write
-  state. While the reasons for the device to transition a zone to read-only
-  state are not defined by the standards, a typical cause for such transition
-  would be a defective write head on an HDD (all zones under this head are
-  changed to read-only).
-
-* A zone may transition to the offline condition (BLK_ZONE_COND_OFFLINE):
-  An offline zone cannot be read nor written. No user action can transition an
-  offline zone back to an operational good state. Similarly to zone read-only
-  transitions, the reasons for a drive to transition a zone to the offline
-  condition are undefined. A typical cause would be a defective read-write head
-  on an HDD causing all zones on the platter under the broken head to be
-  inaccessible.
-
-* Unaligned write errors: These errors result from the host issuing write
-  requests with a start sector that does not correspond to a zone write pointer
-  position when the write request is executed by the device. Even though zonefs
-  enforces sequential file write for sequential zones, unaligned write errors
-  may still happen in the case of a partial failure of a very large direct I/O
-  operation split into multiple BIOs/requests or asynchronous I/O operations.
-  If one of the write request within the set of sequential write requests
-  issued to the device fails, all write requests after queued after it will
-  become unaligned and fail.
-
-* Delayed write errors: similarly to regular block devices, if the device side
-  write cache is enabled, write errors may occur in ranges of previously
-  completed writes when the device write cache is flushed, e.g. on fsync().
-  Similarly to the previous immediate unaligned write error case, delayed write
-  errors can propagate through a stream of cached sequential data for a zone
-  causing all data to be dropped after the sector that caused the error.
-
-All I/O errors detected by zonefs are notified to the user with an error code
-return for the system call that trigered or detected the error. The recovery
-actions taken by zonefs in response to I/O errors depend on the I/O type (read
-vs write) and on the reason for the error (bad sector, unaligned writes or zone
-condition change).
-
-* For read I/O errors, zonefs does not execute any particular recovery action,
-  but only if the file zone is still in a good condition and there is no
-  inconsistency between the file inode size and its zone write pointer position.
-  If a problem is detected, I/O error recovery is executed (see below table).
-
-* For write I/O errors, zonefs I/O error recovery is always executed.
-
-* A zone condition change to read-only or offline also always triggers zonefs
-  I/O error recovery.
-
-Zonefs minimal I/O error recovery may change a file size and a file access
-permissions.
-
-* File size changes:
-  Immediate or delayed write errors in a sequential zone file may cause the file
-  inode size to be inconsistent with the amount of data successfully written in
-  the file zone. For instance, the partial failure of a multi-BIO large write
-  operation will cause the zone write pointer to advance partially, even though
-  the entire write operation will be reported as failed to the user. In such
-  case, the file inode size must be advanced to reflect the zone write pointer
-  change and eventually allow the user to restart writing at the end of the
-  file.
-  A file size may also be reduced to reflect a delayed write error detected on
-  fsync(): in this case, the amount of data effectively written in the zone may
-  be less than originally indicated by the file inode size. After such I/O
-  error, zonefs always fixes a file inode size to reflect the amount of data
-  persistently stored in the file zone.
-
-* Access permission changes:
-  A zone condition change to read-only is indicated with a change in the file
-  access permissions to render the file read-only. This disables changes to the
-  file attributes and data modification. For offline zones, all permissions
-  (read and write) to the file are disabled.
-
-Further action taken by zonefs I/O error recovery can be controlled by the user
-with the "errors=xxx" mount option. The table below summarizes the result of
-zonefs I/O error processing depending on the mount option and on the zone
-conditions.
-
-    +--------------+-----------+-----------------------------------------+
-    |              |           |            Post error state             |
-    | "errors=xxx" |  device   |                 access permissions      |
-    |    mount     |   zone    | file         file          device zone  |
-    |    option    | condition | size     read    write    read    write |
-    +--------------+-----------+-----------------------------------------+
-    |              | good      | fixed    yes     no       yes     yes   |
-    | remount-ro   | read-only | fixed    yes     no       yes     no    |
-    | (default)    | offline   |   0      no      no       no      no    |
-    +--------------+-----------+-----------------------------------------+
-    |              | good      | fixed    yes     no       yes     yes   |
-    | zone-ro      | read-only | fixed    yes     no       yes     no    |
-    |              | offline   |   0      no      no       no      no    |
-    +--------------+-----------+-----------------------------------------+
-    |              | good      |   0      no      no       yes     yes   |
-    | zone-offline | read-only |   0      no      no       yes     no    |
-    |              | offline   |   0      no      no       no      no    |
-    +--------------+-----------+-----------------------------------------+
-    |              | good      | fixed    yes     yes      yes     yes   |
-    | repair       | read-only | fixed    yes     no       yes     no    |
-    |              | offline   |   0      no      no       no      no    |
-    +--------------+-----------+-----------------------------------------+
-
-Further notes:
-* The "errors=remount-ro" mount option is the default behavior of zonefs I/O
-  error processing if no errors mount option is specified.
-* With the "errors=remount-ro" mount option, the change of the file access
-  permissions to read-only applies to all files. The file system is remounted
-  read-only.
-* Access permission and file size changes due to the device transitioning zones
-  to the offline condition are permanent. Remounting or reformating the device
-  with mkfs.zonefs (mkzonefs) will not change back offline zone files to a good
-  state.
-* File access permission changes to read-only due to the device transitioning
-  zones to the read-only condition are permanent. Remounting or reformating
-  the device will not re-enable file write access.
-* File access permission changes implied by the remount-ro, zone-ro and
-  zone-offline mount options are temporary for zones in a good condition.
-  Unmounting and remounting the file system will restore the previous default
-  (format time values) access rights to the files affected.
-* The repair mount option triggers only the minimal set of I/O error recovery
-  actions, that is, file size fixes for zones in a good condition. Zones
-  indicated as being read-only or offline by the device still imply changes to
-  the zone file access permissions as noted in the table above.
-
-Mount options
--------------
-
-zonefs define the "errors=<behavior>" mount option to allow the user to specify
-zonefs behavior in response to I/O errors, inode size inconsistencies or zone
-condition chages. The defined behaviors are as follow:
-* remount-ro (default)
-* zone-ro
-* zone-offline
-* repair
-
-The I/O error actions defined for each behavior is detailed in the previous
-section.
-
-Zonefs User Space Tools
-=======================
-
-The mkzonefs tool is used to format zoned block devices for use with zonefs.
-This tool is available on Github at:
-
-https://github.com/damien-lemoal/zonefs-tools
-
-zonefs-tools also includes a test suite which can be run against any zoned
-block device, including null_blk block device created with zoned mode.
-
-Examples
---------
-
-The following formats a 15TB host-managed SMR HDD with 256 MB zones
-with the conventional zones aggregation feature enabled.
-
-# mkzonefs -o aggr_cnv /dev/sdX
-# mount -t zonefs /dev/sdX /mnt
-# ls -l /mnt/
-total 0
-dr-xr-xr-x 2 root root     1 Nov 25 13:23 cnv
-dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq
-
-The size of the zone files sub-directories indicate the number of files
-existing for each type of zones. In this example, there is only one
-conventional zone file (all conventional zones are aggregated under a single
-file).
-
-# ls -l /mnt/cnv
-total 137101312
--rw-r----- 1 root root 140391743488 Nov 25 13:23 0
-
-This aggregated conventional zone file can be used as a regular file.
-
-# mkfs.ext4 /mnt/cnv/0
-# mount -o loop /mnt/cnv/0 /data
-
-The "seq" sub-directory grouping files for sequential write zones has in this
-example 55356 zones.
-
-# ls -lv /mnt/seq
-total 14511243264
--rw-r----- 1 root root 0 Nov 25 13:23 0
--rw-r----- 1 root root 0 Nov 25 13:23 1
--rw-r----- 1 root root 0 Nov 25 13:23 2
-...
--rw-r----- 1 root root 0 Nov 25 13:23 55354
--rw-r----- 1 root root 0 Nov 25 13:23 55355
-
-For sequential write zone files, the file size changes as data is appended at
-the end of the file, similarly to any regular file system.
-
-# dd if=/dev/zero of=/mnt/seq/0 bs=4096 count=1 conv=notrunc oflag=direct
-1+0 records in
-1+0 records out
-4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00044121 s, 9.3 MB/s
-
-# ls -l /mnt/seq/0
--rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0
-
-The written file can be truncated to the zone size, preventing any further
-write operation.
-
-# truncate -s 268435456 /mnt/seq/0
-# ls -l /mnt/seq/0
--rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0
-
-Truncation to 0 size allows freeing the file zone storage space and restart
-append-writes to the file.
-
-# truncate -s 0 /mnt/seq/0
-# ls -l /mnt/seq/0
--rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0
-
-Since files are statically mapped to zones on the disk, the number of blocks of
-a file as reported by stat() and fstat() indicates the size of the file zone.
-
-# stat /mnt/seq/0
-  File: /mnt/seq/0
-  Size: 0         	Blocks: 524288     IO Block: 4096   regular empty file
-Device: 870h/2160d	Inode: 50431       Links: 1
-Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/    root)
-Access: 2019-11-25 13:23:57.048971997 +0900
-Modify: 2019-11-25 13:52:25.553805765 +0900
-Change: 2019-11-25 13:52:25.553805765 +0900
- Birth: -
-
-The number of blocks of the file ("Blocks") in units of 512B blocks gives the
-maximum file size of 524288 * 512 B = 256 MB, corresponding to the device zone
-size in this example. Of note is that the "IO block" field always indicates the
-minimum I/O size for writes and corresponds to the device physical sector size.
-- 
cgit