diff options
Diffstat (limited to 'Documentation/filesystems')
| -rw-r--r-- | Documentation/filesystems/9p.rst (renamed from Documentation/filesystems/9p.txt) | 124 | ||||
| -rw-r--r-- | Documentation/filesystems/adfs.rst (renamed from Documentation/filesystems/adfs.txt) | 29 | ||||
| -rw-r--r-- | Documentation/filesystems/affs.rst (renamed from Documentation/filesystems/affs.txt) | 62 | ||||
| -rw-r--r-- | Documentation/filesystems/afs.rst (renamed from Documentation/filesystems/afs.txt) | 73 | ||||
| -rw-r--r-- | Documentation/filesystems/autofs-mount-control.rst (renamed from Documentation/filesystems/autofs-mount-control.txt) | 108 | ||||
| -rw-r--r-- | Documentation/filesystems/befs.rst (renamed from Documentation/filesystems/befs.txt) | 59 | ||||
| -rw-r--r-- | Documentation/filesystems/bfs.rst (renamed from Documentation/filesystems/bfs.txt) | 37 | ||||
| -rw-r--r-- | Documentation/filesystems/btrfs.rst (renamed from Documentation/filesystems/btrfs.txt) | 3 | ||||
| -rw-r--r-- | Documentation/filesystems/ceph.rst (renamed from Documentation/filesystems/ceph.txt) | 32 | ||||
| -rw-r--r-- | Documentation/filesystems/cifs/cifsroot.txt | 2 | ||||
| -rw-r--r-- | Documentation/filesystems/cramfs.rst (renamed from Documentation/filesystems/cramfs.txt) | 19 | ||||
| -rw-r--r-- | Documentation/filesystems/debugfs.rst (renamed from Documentation/filesystems/debugfs.txt) | 66 | ||||
| -rw-r--r-- | Documentation/filesystems/dlmfs.rst (renamed from Documentation/filesystems/dlmfs.txt) | 28 | ||||
| -rw-r--r-- | Documentation/filesystems/ecryptfs.rst (renamed from Documentation/filesystems/ecryptfs.txt) | 51 | ||||
| -rw-r--r-- | Documentation/filesystems/efivarfs.rst (renamed from Documentation/filesystems/efivarfs.txt) | 5 | ||||
| -rw-r--r-- | Documentation/filesystems/erofs.rst (renamed from Documentation/filesystems/erofs.txt) | 177 | ||||
| -rw-r--r-- | Documentation/filesystems/ext2.rst (renamed from Documentation/filesystems/ext2.txt) | 41 | ||||
| -rw-r--r-- | Documentation/filesystems/ext3.rst (renamed from Documentation/filesystems/ext3.txt) | 2 | ||||
| -rw-r--r-- | Documentation/filesystems/f2fs.rst (renamed from Documentation/filesystems/f2fs.txt) | 258 | ||||
| -rw-r--r-- | Documentation/filesystems/fiemap.txt | 6 | ||||
| -rw-r--r-- | Documentation/filesystems/fscrypt.rst | 11 | ||||
| -rw-r--r-- | Documentation/filesystems/fuse.rst | 5 | ||||
| -rw-r--r-- | Documentation/filesystems/gfs2-uevents.rst (renamed from Documentation/filesystems/gfs2-uevents.txt) | 20 | ||||
| -rw-r--r-- | Documentation/filesystems/gfs2.rst (renamed from Documentation/filesystems/gfs2.txt) | 20 | ||||
| -rw-r--r-- | Documentation/filesystems/hfs.rst (renamed from Documentation/filesystems/hfs.txt) | 23 | ||||
| -rw-r--r-- | Documentation/filesystems/hfsplus.rst (renamed from Documentation/filesystems/hfsplus.txt) | 2 | ||||
| -rw-r--r-- | Documentation/filesystems/hpfs.rst (renamed from Documentation/filesystems/hpfs.txt) | 239 | ||||
| -rw-r--r-- | Documentation/filesystems/index.rst | 47 | ||||
| -rw-r--r-- | Documentation/filesystems/inotify.rst (renamed from Documentation/filesystems/inotify.txt) | 33 | ||||
| -rw-r--r-- | Documentation/filesystems/isofs.rst | 64 | ||||
| -rw-r--r-- | Documentation/filesystems/isofs.txt | 48 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/index.rst | 13 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/knfsd-stats.rst (renamed from Documentation/filesystems/nfs/knfsd-stats.txt) | 17 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/nfs41-server.rst | 256 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/nfs41-server.txt | 173 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/pnfs.rst (renamed from Documentation/filesystems/nfs/pnfs.txt) | 25 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/rpc-cache.rst (renamed from Documentation/filesystems/nfs/rpc-cache.txt) | 136 | ||||
| -rw-r--r-- | Documentation/filesystems/nfs/rpc-server-gss.rst (renamed from Documentation/filesystems/nfs/rpc-server-gss.txt) | 19 | ||||
| -rw-r--r-- | Documentation/filesystems/nilfs2.rst (renamed from Documentation/filesystems/nilfs2.txt) | 40 | ||||
| -rw-r--r-- | Documentation/filesystems/ntfs.rst (renamed from Documentation/filesystems/ntfs.txt) | 145 | ||||
| -rw-r--r-- | Documentation/filesystems/ocfs2-online-filecheck.rst (renamed from Documentation/filesystems/ocfs2-online-filecheck.txt) | 45 | ||||
| -rw-r--r-- | Documentation/filesystems/ocfs2.rst (renamed from Documentation/filesystems/ocfs2.txt) | 31 | ||||
| -rw-r--r-- | Documentation/filesystems/omfs.rst | 112 | ||||
| -rw-r--r-- | Documentation/filesystems/omfs.txt | 106 | ||||
| -rw-r--r-- | Documentation/filesystems/orangefs.rst (renamed from Documentation/filesystems/orangefs.txt) | 209 | ||||
| -rw-r--r-- | Documentation/filesystems/overlayfs.rst | 82 | ||||
| -rw-r--r-- | Documentation/filesystems/path-lookup.rst | 7 | ||||
| -rw-r--r-- | Documentation/filesystems/proc.rst (renamed from Documentation/filesystems/proc.txt) | 1544 | ||||
| -rw-r--r-- | Documentation/filesystems/qnx6.rst (renamed from Documentation/filesystems/qnx6.txt) | 24 | ||||
| -rw-r--r-- | Documentation/filesystems/ramfs-rootfs-initramfs.rst (renamed from Documentation/filesystems/ramfs-rootfs-initramfs.txt) | 54 | ||||
| -rw-r--r-- | Documentation/filesystems/relay.rst (renamed from Documentation/filesystems/relay.txt) | 139 | ||||
| -rw-r--r-- | Documentation/filesystems/romfs.rst (renamed from Documentation/filesystems/romfs.txt) | 42 | ||||
| -rw-r--r-- | Documentation/filesystems/squashfs.rst (renamed from Documentation/filesystems/squashfs.txt) | 60 | ||||
| -rw-r--r-- | Documentation/filesystems/sysfs.rst (renamed from Documentation/filesystems/sysfs.txt) | 324 | ||||
| -rw-r--r-- | Documentation/filesystems/sysv-fs.rst (renamed from Documentation/filesystems/sysv-fs.txt) | 153 | ||||
| -rw-r--r-- | Documentation/filesystems/tmpfs.rst (renamed from Documentation/filesystems/tmpfs.txt) | 44 | ||||
| -rw-r--r-- | Documentation/filesystems/ubifs-authentication.rst | 10 | ||||
| -rw-r--r-- | Documentation/filesystems/ubifs.rst (renamed from Documentation/filesystems/ubifs.txt) | 25 | ||||
| -rw-r--r-- | Documentation/filesystems/udf.rst (renamed from Documentation/filesystems/udf.txt) | 21 | ||||
| -rw-r--r-- | Documentation/filesystems/virtiofs.rst | 2 | ||||
| -rw-r--r-- | Documentation/filesystems/zonefs.rst (renamed from Documentation/filesystems/zonefs.txt) | 126 | 
61 files changed, 3309 insertions, 2369 deletions
| diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.rst index fec7144e817c..671fef39a802 100644 --- a/Documentation/filesystems/9p.txt +++ b/Documentation/filesystems/9p.rst @@ -1,7 +1,10 @@ -	  	    v9fs: Plan 9 Resource Sharing for Linux -		    ======================================= +.. SPDX-License-Identifier: GPL-2.0 -ABOUT +======================================= +v9fs: Plan 9 Resource Sharing for Linux +======================================= + +About  =====  v9fs is a Unix implementation of the Plan 9 9p remote filesystem protocol. @@ -14,32 +17,34 @@ and Maya Gokhale.  Additional development by Greg Watson  The best detailed explanation of the Linux implementation and applications of  the 9p client is available in the form of a USENIX paper: +     http://www.usenix.org/events/usenix05/tech/freenix/hensbergen.html  Other applications are described in the following papers: +  	* XCPU & Clustering -		http://xcpu.org/papers/xcpu-talk.pdf +	  http://xcpu.org/papers/xcpu-talk.pdf  	* KVMFS: control file system for KVM -		http://xcpu.org/papers/kvmfs.pdf +	  http://xcpu.org/papers/kvmfs.pdf  	* CellFS: A New Programming Model for the Cell BE -		http://xcpu.org/papers/cellfs-talk.pdf +	  http://xcpu.org/papers/cellfs-talk.pdf  	* PROSE I/O: Using 9p to enable Application Partitions -		http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf +	  http://plan9.escet.urjc.es/iwp9/cready/PROSE_iwp9_2006.pdf  	* VirtFS: A Virtualization Aware File System pass-through -		http://goo.gl/3WPDg +	  http://goo.gl/3WPDg -USAGE +Usage  ===== -For remote file server: +For remote file server::  	mount -t 9p 10.10.1.2 /mnt/9 -For Plan 9 From User Space applications (http://swtch.com/plan9) +For Plan 9 From User Space applications (http://swtch.com/plan9)::  	mount -t 9p `namespace`/acme /mnt/9 -o trans=unix,uname=$USER -For server running on QEMU host with virtio transport: +For server running on QEMU host with virtio transport::  	mount -t 9p -o trans=virtio <mount_tag> /mnt/9 @@ -48,18 +53,22 @@ mount points. Each 9P export is seen by the client as a virtio device with an  associated "mount_tag" property. Available mount tags can be  seen by reading /sys/bus/virtio/drivers/9pnet_virtio/virtio<n>/mount_tag files. -OPTIONS +Options  ======= +  ============= ===============================================================    trans=name	select an alternative transport.  Valid options are    		currently: -			unix 	- specifying a named pipe mount point -			tcp	- specifying a normal TCP/IP connection -			fd   	- used passed file descriptors for connection -                                (see rfdno and wfdno) -			virtio	- connect to the next virtio channel available -				(from QEMU with trans_virtio module) -			rdma	- connect to a specified RDMA channel + +			========  ============================================ +			unix 	  specifying a named pipe mount point +			tcp	  specifying a normal TCP/IP connection +			fd   	  used passed file descriptors for connection +                                  (see rfdno and wfdno) +			virtio	  connect to the next virtio channel available +				  (from QEMU with trans_virtio module) +			rdma	  connect to a specified RDMA channel +			========  ============================================    uname=name	user name to attempt mount as on the remote server.  The    		server may override or ignore this value.  Certain user @@ -69,28 +78,36 @@ OPTIONS    		offering several exported file systems.    cache=mode	specifies a caching policy.  By default, no caches are used. -                        none = default no cache policy, metadata and data + +                        none +				default no cache policy, metadata and data                                  alike are synchronous. -			loose = no attempts are made at consistency, +			loose +				no attempts are made at consistency,                                  intended for exclusive, read-only mounts -                        fscache = use FS-Cache for a persistent, read-only +                        fscache +				use FS-Cache for a persistent, read-only  				cache backend. -                        mmap = minimal cache that is only used for read-write +                        mmap +				minimal cache that is only used for read-write                                  mmap.  Northing else is cached, like cache=none    debug=n	specifies debug level.  The debug level is a bitmask. -			0x01  = display verbose error messages -			0x02  = developer debug (DEBUG_CURRENT) -			0x04  = display 9p trace -			0x08  = display VFS trace -			0x10  = display Marshalling debug -			0x20  = display RPC debug -			0x40  = display transport debug -			0x80  = display allocation debug -			0x100 = display protocol message debug -			0x200 = display Fid debug -			0x400 = display packet debug -			0x800 = display fscache tracing debug + +			=====   ================================ +			0x01    display verbose error messages +			0x02    developer debug (DEBUG_CURRENT) +			0x04    display 9p trace +			0x08    display VFS trace +			0x10    display Marshalling debug +			0x20    display RPC debug +			0x40    display transport debug +			0x80    display allocation debug +			0x100   display protocol message debug +			0x200   display Fid debug +			0x400   display packet debug +			0x800   display fscache tracing debug +			=====   ================================    rfdno=n	the file descriptor for reading with trans=fd @@ -103,9 +120,12 @@ OPTIONS    noextend	force legacy mode (no 9p2000.u or 9p2000.L semantics)    version=name	Select 9P protocol version. Valid options are: -			9p2000          - Legacy mode (same as noextend) -			9p2000.u        - Use 9P2000.u protocol -			9p2000.L        - Use 9P2000.L protocol + +			========        ============================== +			9p2000          Legacy mode (same as noextend) +			9p2000.u        Use 9P2000.u protocol +			9p2000.L        Use 9P2000.L protocol +			========        ==============================    dfltuid	attempt to mount as a particular uid @@ -118,22 +138,37 @@ OPTIONS  		hosts.  This functionality will be expanded in later versions.    access	there are four access modes. -			user  = if a user tries to access a file on v9fs +			user +				if a user tries to access a file on v9fs  			        filesystem for the first time, v9fs sends an  			        attach command (Tattach) for that user.  				This is the default mode. -			<uid> = allows only user with uid=<uid> to access +			<uid> +				allows only user with uid=<uid> to access  				the files on the mounted filesystem -			any   = v9fs does single attach and performs all +			any +				v9fs does single attach and performs all  				operations as one user -			client = ACL based access check on the 9p client +			clien +				 ACL based access check on the 9p client  			         side for access validation    cachetag	cache tag to use the specified persistent cache.  		cache tags for existing cache sessions can be listed at  		/sys/fs/9p/caches. (applies only to cache=fscache) +  ============= =============================================================== + +Behavior +======== + +This section aims at describing 9p 'quirks' that can be different +from a local filesystem behaviors. -RESOURCES + - Setting O_NONBLOCK on a file will make client reads return as early +   as the server returns some data instead of trying to fill the read +   buffer with the requested amount of bytes or end of file is reached. + +Resources  =========  Protocol specifications are maintained on github: @@ -158,4 +193,3 @@ http://plan9.bell-labs.com/plan9  For information on Plan 9 from User Space (Plan 9 applications and libraries  ported to Linux/BSD/OSX/etc) check out http://swtch.com/plan9 - diff --git a/Documentation/filesystems/adfs.txt b/Documentation/filesystems/adfs.rst index 0baa8e8c1fc1..5b22cae38e5e 100644 --- a/Documentation/filesystems/adfs.txt +++ b/Documentation/filesystems/adfs.rst @@ -1,3 +1,9 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============================== +Acorn Disc Filing System - ADFS +=============================== +  Filesystems supported by ADFS  ----------------------------- @@ -25,6 +31,7 @@ directory updates, specifically updating the access mode and timestamp.  Mount options for ADFS  ---------------------- +  ============  ======================================================    uid=nnn	All files in the partition will be owned by  		user id nnn.  Default 0 (root).    gid=nnn	All files in the partition will be in group @@ -36,22 +43,23 @@ Mount options for ADFS    ftsuffix=n	When ftsuffix=0, no file type suffix will be applied.  		When ftsuffix=1, a hexadecimal suffix corresponding to  		the RISC OS file type will be added.  Default 0. +  ============  ======================================================  Mapping of ADFS permissions to Linux permissions  ------------------------------------------------    ADFS permissions consist of the following: -	Owner read -	Owner write -	Other read -	Other write +	- Owner read +	- Owner write +	- Other read +	- Other write    (In older versions, an 'execute' permission did exist, but this -   does not hold the same meaning as the Linux 'execute' permission -   and is now obsolete). +  does not hold the same meaning as the Linux 'execute' permission +  and is now obsolete). -  The mapping is performed as follows: +  The mapping is performed as follows::  	Owner read				-> -r--r--r--  	Owner write				-> --w--w---w @@ -66,17 +74,18 @@ Mapping of ADFS permissions to Linux permissions  	Possible other mode permissions		-> ----rwxrwx    Hence, with the default masks, if a file is owner read/write, and -  not a UnixExec filetype, then the permissions will be: +  not a UnixExec filetype, then the permissions will be::  			-rw-------    However, if the masks were ownmask=0770,othmask=0007, then this would -  be modified to: +  be modified to:: +  			-rw-rw----    There is no restriction on what you can do with these masks.  You may    wish that either read bits give read access to the file for all, but -  keep the default write protection (ownmask=0755,othmask=0577): +  keep the default write protection (ownmask=0755,othmask=0577)::  			-rw-r--r-- diff --git a/Documentation/filesystems/affs.txt b/Documentation/filesystems/affs.rst index 71b63c2b9841..7f1a40dce6d3 100644 --- a/Documentation/filesystems/affs.txt +++ b/Documentation/filesystems/affs.rst @@ -1,9 +1,13 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============================  Overview of Amiga Filesystems  =============================  Not all varieties of the Amiga filesystems are supported for reading and  writing. The Amiga currently knows six different filesystems: +==============	===============================================================  DOS\0		The old or original filesystem, not really suited for  		hard disks and normally not used on them, either.  		Supported read/write. @@ -23,6 +27,7 @@ DOS\4		The original filesystem with directory cache. The directory  		sense on hard disks. Supported read only.  DOS\5		The Fast File System with directory cache. Supported read only. +==============	===============================================================  All of the above filesystems allow block sizes from 512 to 32K bytes.  Supported block sizes are: 512, 1024, 2048 and 4096 bytes. Larger blocks @@ -36,14 +41,18 @@ are supported, too.  Mount options for the AFFS  ========================== -protect		If this option is set, the protection bits cannot be altered. +protect +		If this option is set, the protection bits cannot be altered. -setuid[=uid]	This sets the owner of all files and directories in the file +setuid[=uid] +		This sets the owner of all files and directories in the file  		system to uid or the uid of the current user, respectively. -setgid[=gid]	Same as above, but for gid. +setgid[=gid] +		Same as above, but for gid. -mode=mode	Sets the mode flags to the given (octal) value, regardless +mode=mode +		Sets the mode flags to the given (octal) value, regardless  		of the original permissions. Directories will get an x  		permission if the corresponding r bit is set.  		This is useful since most of the plain AmigaOS files @@ -53,33 +62,41 @@ nofilenametruncate  		The file system will return an error when filename exceeds  		standard maximum filename length (30 characters). -reserved=num	Sets the number of reserved blocks at the start of the +reserved=num +		Sets the number of reserved blocks at the start of the  		partition to num. You should never need this option.  		Default is 2. -root=block	Sets the block number of the root block. This should never +root=block +		Sets the block number of the root block. This should never  		be necessary. -bs=blksize	Sets the blocksize to blksize. Valid block sizes are 512, +bs=blksize +		Sets the blocksize to blksize. Valid block sizes are 512,  		1024, 2048 and 4096. Like the root option, this should  		never be necessary, as the affs can figure it out itself. -quiet		The file system will not return an error for disallowed +quiet +		The file system will not return an error for disallowed  		mode changes. -verbose		The volume name, file system type and block size will +verbose +		The volume name, file system type and block size will  		be written to the syslog when the filesystem is mounted. -mufs		The filesystem is really a muFS, also it doesn't +mufs +		The filesystem is really a muFS, also it doesn't  		identify itself as one. This option is necessary if  		the filesystem wasn't formatted as muFS, but is used  		as one. -prefix=path	Path will be prefixed to every absolute path name of +prefix=path +		Path will be prefixed to every absolute path name of  		symbolic links on an AFFS partition. Default = "/".  		(See below.) -volume=name	When symbolic links with an absolute path are created +volume=name +		When symbolic links with an absolute path are created  		on an AFFS partition, name will be prepended as the  		volume name. Default = "" (empty string).  		(See below.) @@ -119,7 +136,7 @@ The Linux rwxrwxrwx file mode is handled as follows:    - All other flags (suid, sgid, ...) are ignored and will      not be retained. -     +  Newly created files and directories will get the user and group ID  of the current user and a mode according to the umask. @@ -148,11 +165,13 @@ might be "User", "WB" and "Graphics", the mount points /amiga/User,  Examples  ======== -Command line: +Command line:: +      mount  Archive/Amiga/Workbench3.1.adf /mnt -t affs -o loop,verbose      mount  /dev/sda3 /Amiga -t affs -/etc/fstab entry: +/etc/fstab entry:: +      /dev/sdb5	/amiga/Workbench    affs    noauto,user,exec,verbose 0 0  IMPORTANT NOTE @@ -170,7 +189,8 @@ before booting Windows!  If the damage is already done, the following should fix the RDB  (where <disk> is the device name). -DO AT YOUR OWN RISK: + +DO AT YOUR OWN RISK::    dd if=/dev/<disk> of=rdb.tmp count=1    cp rdb.tmp rdb.fixed @@ -189,10 +209,14 @@ By default, filenames are truncated to 30 characters without warning.  'nofilenametruncate' mount option can change that behavior.  Case is ignored by the affs in filename matching, but Linux shells -do care about the case. Example (with /wb being an affs mounted fs): +do care about the case. Example (with /wb being an affs mounted fs):: +      rm /wb/WRONGCASE -will remove /mnt/wrongcase, but + +will remove /mnt/wrongcase, but:: +      rm /wb/WR* +  will not since the names are matched by the shell.  The block allocation is designed for hard disk partitions. If more @@ -219,4 +243,4 @@ due to an incompatibility with the Amiga floppy controller.  If you are interested in an Amiga Emulator for Linux, look at -http://web.archive.org/web/*/http://www.freiburg.linux.de/~uae/ +http://web.archive.org/web/%2E/http://www.freiburg.linux.de/~uae/ diff --git a/Documentation/filesystems/afs.txt b/Documentation/filesystems/afs.rst index 8c6ea7b41048..c4ec39a5966e 100644 --- a/Documentation/filesystems/afs.txt +++ b/Documentation/filesystems/afs.rst @@ -1,8 +1,10 @@ -			     ==================== -			     kAFS: AFS FILESYSTEM -			     ==================== +.. SPDX-License-Identifier: GPL-2.0 -Contents: +==================== +kAFS: AFS FILESYSTEM +==================== + +.. Contents:   - Overview.   - Usage. @@ -14,8 +16,7 @@ Contents:   - The @sys substitution. -======== -OVERVIEW +Overview  ========  This filesystem provides a fairly simple secure AFS filesystem driver. It is @@ -35,35 +36,33 @@ It does not yet support the following AFS features:   (*) pioctl() system call. -=========== -COMPILATION +Compilation  ===========  The filesystem should be enabled by turning on the kernel configuration -options: +options::  	CONFIG_AF_RXRPC		- The RxRPC protocol transport  	CONFIG_RXKAD		- The RxRPC Kerberos security handler  	CONFIG_AFS		- The AFS filesystem -Additionally, the following can be turned on to aid debugging: +Additionally, the following can be turned on to aid debugging::  	CONFIG_AF_RXRPC_DEBUG	- Permit AF_RXRPC debugging to be enabled  	CONFIG_AFS_DEBUG	- Permit AFS debugging to be enabled  They permit the debugging messages to be turned on dynamically by manipulating -the masks in the following files: +the masks in the following files::  	/sys/module/af_rxrpc/parameters/debug  	/sys/module/kafs/parameters/debug -===== -USAGE +Usage  =====  When inserting the driver modules the root cell must be specified along with a -list of volume location server IP addresses: +list of volume location server IP addresses::  	modprobe rxrpc  	modprobe kafs rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91 @@ -77,14 +76,14 @@ The second module is the kerberos RxRPC security driver, and the third module  is the actual filesystem driver for the AFS filesystem.  Once the module has been loaded, more modules can be added by the following -procedure: +procedure::  	echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells  Where the parameters to the "add" command are the name of a cell and a list of  volume location servers within that cell, with the latter separated by colons. -Filesystems can be mounted anywhere by commands similar to the following: +Filesystems can be mounted anywhere by commands similar to the following::  	mount -t afs "%cambridge.redhat.com:root.afs." /afs  	mount -t afs "#cambridge.redhat.com:root.cell." /afs/cambridge @@ -104,8 +103,7 @@ named volume will be looked up in the cell specified during modprobe.  Additional cells can be added through /proc (see later section). -=========== -MOUNTPOINTS +Mountpoints  ===========  AFS has a concept of mountpoints. In AFS terms, these are specially formatted @@ -123,42 +121,40 @@ culled first.  If all are culled, then the requested volume will also be  unmounted, otherwise error EBUSY will be returned.  This can be used by the administrator to attempt to unmount the whole AFS tree -mounted on /afs in one go by doing: +mounted on /afs in one go by doing::  	umount /afs -============ -DYNAMIC ROOT +Dynamic Root  ============  A mount option is available to create a serverless mount that is only usable -for dynamic lookup.  Creating such a mount can be done by, for example: +for dynamic lookup.  Creating such a mount can be done by, for example::  	mount -t afs none /afs -o dyn  This creates a mount that just has an empty directory at the root.  Attempting  to look up a name in this directory will cause a mountpoint to be created that -looks up a cell of the same name, for example: +looks up a cell of the same name, for example::  	ls /afs/grand.central.org/ -=============== -PROC FILESYSTEM +Proc Filesystem  ===============  The AFS modules creates a "/proc/fs/afs/" directory and populates it:    (*) A "cells" file that lists cells currently known to the afs module and -      their usage counts: +      their usage counts::  	[root@andromeda ~]# cat /proc/fs/afs/cells  	USE NAME  	  3 cambridge.redhat.com    (*) A directory per cell that contains files that list volume location -      servers, volumes, and active servers known within that cell. +      servers, volumes, and active servers known within that cell::  	[root@andromeda ~]# cat /proc/fs/afs/cambridge.redhat.com/servers  	USE ADDR            STATE @@ -171,8 +167,7 @@ The AFS modules creates a "/proc/fs/afs/" directory and populates it:  	  1 Val 20000000 20000001 20000002 root.afs -================= -THE CELL DATABASE +The Cell Database  =================  The filesystem maintains an internal database of all the cells it knows and the @@ -181,7 +176,7 @@ the system belongs is added to the database when modprobe is performed by the  "rootcell=" argument or, if compiled in, using a "kafs.rootcell=" argument on  the kernel command line. -Further cells can be added by commands similar to the following: +Further cells can be added by commands similar to the following::  	echo add CELLNAME VLADDR[:VLADDR][:VLADDR]... >/proc/fs/afs/cells  	echo add grand.central.org 18.9.48.14:128.2.203.61:130.237.48.87 >/proc/fs/afs/cells @@ -189,8 +184,7 @@ Further cells can be added by commands similar to the following:  No other cell database operations are available at this time. -======== -SECURITY +Security  ========  Secure operations are initiated by acquiring a key using the klog program.  A @@ -198,17 +192,17 @@ very primitive klog program is available at:  	http://people.redhat.com/~dhowells/rxrpc/klog.c -This should be compiled by: +This should be compiled by::  	make klog LDLIBS="-lcrypto -lcrypt -lkrb4 -lkeyutils" -And then run as: +And then run as::  	./klog  Assuming it's successful, this adds a key of type RxRPC, named for the service  and cell, eg: "afs@<cellname>".  This can be viewed with the keyctl program or -by cat'ing /proc/keys: +by cat'ing /proc/keys::  	[root@andromeda ~]# keyctl show  	Session Keyring @@ -232,20 +226,19 @@ socket), then the operations on the file will be made with key that was used to  open the file. -===================== -THE @SYS SUBSTITUTION +The @sys Substitution  =====================  The list of up to 16 @sys substitutions for the current network namespace can -be configured by writing a list to /proc/fs/afs/sysname: +be configured by writing a list to /proc/fs/afs/sysname::  	[root@andromeda ~]# echo foo amd64_linux_26 >/proc/fs/afs/sysname -or cleared entirely by writing an empty list: +or cleared entirely by writing an empty list::  	[root@andromeda ~]# echo >/proc/fs/afs/sysname -The current list for current network namespace can be retrieved by: +The current list for current network namespace can be retrieved by::  	[root@andromeda ~]# cat /proc/fs/afs/sysname  	foo diff --git a/Documentation/filesystems/autofs-mount-control.txt b/Documentation/filesystems/autofs-mount-control.rst index acc02fc57993..2903aed92316 100644 --- a/Documentation/filesystems/autofs-mount-control.txt +++ b/Documentation/filesystems/autofs-mount-control.rst @@ -1,4 +1,6 @@ +.. SPDX-License-Identifier: GPL-2.0 +====================================================================  Miscellaneous Device control operations for the autofs kernel module  ==================================================================== @@ -36,24 +38,24 @@ For example, there are two types of automount maps, direct (in the kernel  module source you will see a third type called an offset, which is just  a direct mount in disguise) and indirect. -Here is a master map with direct and indirect map entries: +Here is a master map with direct and indirect map entries:: -/-      /etc/auto.direct -/test   /etc/auto.indirect +    /-      /etc/auto.direct +    /test   /etc/auto.indirect -and the corresponding map files: +and the corresponding map files:: -/etc/auto.direct: +    /etc/auto.direct: -/automount/dparse/g6  budgie:/autofs/export1 -/automount/dparse/g1  shark:/autofs/export1 -and so on. +    /automount/dparse/g6  budgie:/autofs/export1 +    /automount/dparse/g1  shark:/autofs/export1 +    and so on. -/etc/auto.indirect: +/etc/auto.indirect:: -g1    shark:/autofs/export1 -g6    budgie:/autofs/export1 -and so on. +    g1    shark:/autofs/export1 +    g6    budgie:/autofs/export1 +    and so on.  For the above indirect map an autofs file system is mounted on /test and  mounts are triggered for each sub-directory key by the inode lookup @@ -69,23 +71,23 @@ use the follow_link inode operation to trigger the mount.  But, each entry in direct and indirect maps can have offsets (making  them multi-mount map entries). -For example, an indirect mount map entry could also be: +For example, an indirect mount map entry could also be:: -g1  \ -   /        shark:/autofs/export5/testing/test \ -   /s1      shark:/autofs/export/testing/test/s1 \ -   /s2      shark:/autofs/export5/testing/test/s2 \ -   /s1/ss1  shark:/autofs/export1 \ -   /s2/ss2  shark:/autofs/export2 +    g1  \ +    /        shark:/autofs/export5/testing/test \ +    /s1      shark:/autofs/export/testing/test/s1 \ +    /s2      shark:/autofs/export5/testing/test/s2 \ +    /s1/ss1  shark:/autofs/export1 \ +    /s2/ss2  shark:/autofs/export2 -and a similarly a direct mount map entry could also be: +and a similarly a direct mount map entry could also be:: -/automount/dparse/g1 \ -    /       shark:/autofs/export5/testing/test \ -    /s1     shark:/autofs/export/testing/test/s1 \ -    /s2     shark:/autofs/export5/testing/test/s2 \ -    /s1/ss1 shark:/autofs/export2 \ -    /s2/ss2 shark:/autofs/export2 +    /automount/dparse/g1 \ +	/       shark:/autofs/export5/testing/test \ +	/s1     shark:/autofs/export/testing/test/s1 \ +	/s2     shark:/autofs/export5/testing/test/s2 \ +	/s1/ss1 shark:/autofs/export2 \ +	/s2/ss2 shark:/autofs/export2  One of the issues with version 4 of autofs was that, when mounting an  entry with a large number of offsets, possibly with nesting, we needed @@ -170,32 +172,32 @@ autofs Miscellaneous Device mount control interface  The control interface is opening a device node, typically /dev/autofs.  All the ioctls use a common structure to pass the needed parameter -information and return operation results: - -struct autofs_dev_ioctl { -	__u32 ver_major; -	__u32 ver_minor; -	__u32 size;             /* total size of data passed in -				 * including this struct */ -	__s32 ioctlfd;          /* automount command fd */ - -	/* Command parameters */ -	union { -		struct args_protover		protover; -		struct args_protosubver		protosubver; -		struct args_openmount		openmount; -		struct args_ready		ready; -		struct args_fail		fail; -		struct args_setpipefd		setpipefd; -		struct args_timeout		timeout; -		struct args_requester		requester; -		struct args_expire		expire; -		struct args_askumount		askumount; -		struct args_ismountpoint	ismountpoint; -	}; - -	char path[0]; -}; +information and return operation results:: + +    struct autofs_dev_ioctl { +	    __u32 ver_major; +	    __u32 ver_minor; +	    __u32 size;             /* total size of data passed in +				    * including this struct */ +	    __s32 ioctlfd;          /* automount command fd */ + +	    /* Command parameters */ +	    union { +		    struct args_protover		protover; +		    struct args_protosubver		protosubver; +		    struct args_openmount		openmount; +		    struct args_ready		ready; +		    struct args_fail		fail; +		    struct args_setpipefd		setpipefd; +		    struct args_timeout		timeout; +		    struct args_requester		requester; +		    struct args_expire		expire; +		    struct args_askumount		askumount; +		    struct args_ismountpoint	ismountpoint; +	    }; + +	    char path[0]; +    };  The ioctlfd field is a mount point file descriptor of an autofs mount  point. It is returned by the open call and is used by all calls except @@ -212,7 +214,7 @@ is used account for the increased structure length when translating the  structure sent from user space.  This structure can be initialized before setting specific fields by using -the void function call init_autofs_dev_ioctl(struct autofs_dev_ioctl *). +the void function call init_autofs_dev_ioctl(``struct autofs_dev_ioctl *``).  All of the ioctls perform a copy of this structure from user space to  kernel space and return -EINVAL if the size parameter is smaller than diff --git a/Documentation/filesystems/befs.txt b/Documentation/filesystems/befs.rst index da45e6c842b8..79f9740d76ff 100644 --- a/Documentation/filesystems/befs.txt +++ b/Documentation/filesystems/befs.rst @@ -1,48 +1,54 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=========================  BeOS filesystem for Linux +=========================  Document last updated: Dec 6, 2001 -WARNING +Warning  =======  Make sure you understand that this is alpha software.  This means that the -implementation is neither complete nor well-tested.  +implementation is neither complete nor well-tested.  I DISCLAIM ALL RESPONSIBILITY FOR ANY POSSIBLE BAD EFFECTS OF THIS CODE! -LICENSE -===== -This software is covered by the GNU General Public License.  +License +======= +This software is covered by the GNU General Public License.  See the file COPYING for the complete text of the license.  Or the GNU website: <http://www.gnu.org/licenses/licenses.html> -AUTHOR -===== +Author +======  The largest part of the code written by Will Dyson <[email protected]>  He has been working on the code since Aug 13, 2001. See the changelog for  details.  Original Author: Makoto Kato <[email protected]> +  His original code can still be found at:  <http://hp.vector.co.jp/authors/VA008030/bfs/> +  Does anyone know of a more current email address for Makoto? He doesn't  respond to the address given above...  This filesystem doesn't have a maintainer. -WHAT IS THIS DRIVER? -================== -This module implements the native filesystem of BeOS http://www.beincorporated.com/  +What is this Driver? +==================== +This module implements the native filesystem of BeOS http://www.beincorporated.com/  for the linux 2.4.1 and later kernels. Currently it is a read-only  implementation.  Which is it, BFS or BEFS? -================ -Be, Inc said, "BeOS Filesystem is officially called BFS, not BeFS".  +========================= +Be, Inc said, "BeOS Filesystem is officially called BFS, not BeFS".  But Unixware Boot Filesystem is called bfs, too. And they are already in  the kernel. Because of this naming conflict, on Linux the BeOS  filesystem is called befs. -HOW TO INSTALL +How to Install  ==============  step 1.  Install the BeFS  patch into the source code tree of linux. @@ -54,16 +60,16 @@ is called patch-befs-xxx, you would do the following:  	patch -p1 < /path/to/patch-befs-xxx  if the patching step fails (i.e. there are rejected hunks), you can try to -figure it out yourself (it shouldn't be hard), or mail the maintainer  +figure it out yourself (it shouldn't be hard), or mail the maintainer  (Will Dyson <[email protected]>) for help.  step 2.  Configuration & make kernel  The linux kernel has many compile-time options. Most of them are beyond the  scope of this document. I suggest the Kernel-HOWTO document as a good general -reference on this topic. http://www.linuxdocs.org/HOWTOs/Kernel-HOWTO-4.html  +reference on this topic. http://www.linuxdocs.org/HOWTOs/Kernel-HOWTO-4.html -However, to use the BeFS module, you must enable it at configure time. +However, to use the BeFS module, you must enable it at configure time::  	cd /foo/bar/linux  	make menuconfig (or xconfig) @@ -82,35 +88,40 @@ step 3.  Install  See the kernel howto <http://www.linux.com/howto/Kernel-HOWTO.html> for  instructions on this critical step. -USING BFS +Using BFS  =========  To use the BeOS filesystem, use filesystem type 'befs'. -ex) +ex:: +      mount -t befs /dev/fd0 /beos -MOUNT OPTIONS +Mount Options  ============= + +=============  ===========================================================  uid=nnn        All files in the partition will be owned by user id nnn.  gid=nnn	       All files in the partition will be in group nnn.  iocharset=xxx  Use xxx as the name of the NLS translation table.  debug          The driver will output debugging information to the syslog. +=============  =========================================================== -HOW TO GET LASTEST VERSION +How to Get Lastest Version  ==========================  The latest version is currently available at:  <http://befs-driver.sourceforge.net/> -ANY KNOWN BUGS? -=========== +Any Known Bugs? +===============  As of Jan 20, 2002: -	 +  	None -SPECIAL THANKS +Special Thanks  ==============  Dominic Giampalo ... Writing "Practical file system design with Be filesystem" +  Hiroyuki Yamada  ... Testing LinuxPPC. diff --git a/Documentation/filesystems/bfs.txt b/Documentation/filesystems/bfs.rst index 843ce91a2e40..ce14b9018807 100644 --- a/Documentation/filesystems/bfs.txt +++ b/Documentation/filesystems/bfs.rst @@ -1,4 +1,7 @@ -BFS FILESYSTEM FOR LINUX +.. SPDX-License-Identifier: GPL-2.0 + +======================== +BFS Filesystem for Linux  ========================  The BFS filesystem is used by SCO UnixWare OS for the /stand slice, which @@ -9,22 +12,22 @@ In order to access /stand partition under Linux you obviously need to  know the partition number and the kernel must support UnixWare disk slices  (CONFIG_UNIXWARE_DISKLABEL config option). However BFS support does not  depend on having UnixWare disklabel support because one can also mount -BFS filesystem via loopback: +BFS filesystem via loopback:: -# losetup /dev/loop0 stand.img -# mount -t bfs /dev/loop0 /mnt/stand +    # losetup /dev/loop0 stand.img +    # mount -t bfs /dev/loop0 /mnt/stand -where stand.img is a file containing the image of BFS filesystem.  +where stand.img is a file containing the image of BFS filesystem.  When you have finished using it and umounted you need to also deallocate -/dev/loop0 device by: +/dev/loop0 device by:: -# losetup -d /dev/loop0 +    # losetup -d /dev/loop0 -You can simplify mounting by just typing: +You can simplify mounting by just typing:: -# mount -t bfs -o loop stand.img /mnt/stand +    # mount -t bfs -o loop stand.img /mnt/stand -this will allocate the first available loopback device (and load loop.o  +this will allocate the first available loopback device (and load loop.o  kernel module if necessary) automatically. If the loopback driver is not  loaded automatically, make sure that you have compiled the module and  that modprobe is functioning. Beware that umount will not deallocate @@ -33,21 +36,21 @@ that modprobe is functioning. Beware that umount will not deallocate  losetup(8). Read losetup(8) manpage for more info.  To create the BFS image under UnixWare you need to find out first which -slice contains it. The command prtvtoc(1M) is your friend: +slice contains it. The command prtvtoc(1M) is your friend:: -# prtvtoc /dev/rdsk/c0b0t0d0s0 +    # prtvtoc /dev/rdsk/c0b0t0d0s0  (assuming your root disk is on target=0, lun=0, bus=0, controller=0). Then you  look for the slice with tag "STAND", which is usually slice 10. With this -information you can use dd(1) to create the BFS image: +information you can use dd(1) to create the BFS image:: -# umount /stand -# dd if=/dev/rdsk/c0b0t0d0sa of=stand.img bs=512 +    # umount /stand +    # dd if=/dev/rdsk/c0b0t0d0sa of=stand.img bs=512  Just in case, you can verify that you have done the right thing by checking -the magic number: +the magic number:: -# od -Ad -tx4 stand.img | more +    # od -Ad -tx4 stand.img | more  The first 4 bytes should be 0x1badface. diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.rst index f9dad22d95ce..d0904f602819 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.rst @@ -1,3 +1,6 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=====  BTRFS  ===== diff --git a/Documentation/filesystems/ceph.txt b/Documentation/filesystems/ceph.rst index b19b6a03f91c..0aa70750df0f 100644 --- a/Documentation/filesystems/ceph.txt +++ b/Documentation/filesystems/ceph.rst @@ -1,3 +1,6 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================  Ceph Distributed File System  ============================ @@ -15,6 +18,7 @@ Basic features include:   * Easy deployment: most FS components are userspace daemons  Also, +   * Flexible snapshots (on any directory)   * Recursive accounting (nested files, directories, bytes) @@ -63,7 +67,7 @@ no 'du' or similar recursive scan of the file system is required.  Finally, Ceph also allows quotas to be set on any directory in the system.  The quota can restrict the number of bytes or the number of files stored  beneath that point in the directory hierarchy.  Quotas can be set using -extended attributes 'ceph.quota.max_files' and 'ceph.quota.max_bytes', eg: +extended attributes 'ceph.quota.max_files' and 'ceph.quota.max_bytes', eg::   setfattr -n ceph.quota.max_bytes -v 100000000 /some/dir   getfattr -n ceph.quota.max_bytes /some/dir @@ -76,7 +80,7 @@ from writing as much data as it needs.  Mount Syntax  ============ -The basic mount syntax is: +The basic mount syntax is::   # mount -t ceph monip[:port][,monip2[:port]...]:/[subdir] mnt @@ -84,7 +88,7 @@ You only need to specify a single monitor, as the client will get the  full list when it connects.  (However, if the monitor you specify  happens to be down, the mount won't succeed.)  The port can be left  off if the monitor is using the default.  So if the monitor is at -1.2.3.4, +1.2.3.4::   # mount -t ceph 1.2.3.4:/ /mnt/ceph @@ -103,17 +107,17 @@ Mount Options  	address its connection to the monitor originates from.    wsize=X -	Specify the maximum write size in bytes.  Default: 16 MB. +	Specify the maximum write size in bytes.  Default: 64 MB.    rsize=X -	Specify the maximum read size in bytes.  Default: 16 MB. +	Specify the maximum read size in bytes.  Default: 64 MB.    rasize=X  	Specify the maximum readahead size in bytes.  Default: 8 MB.    mount_timeout=X  	Specify the timeout value for mount (in seconds), in the case -	of a non-responsive Ceph file system.  The default is 30 +	of a non-responsive Ceph file system.  The default is 60  	seconds.    caps_max=X @@ -163,14 +167,14 @@ Mount Options  	available modes are "no" and "clean". The default is "no".  	* no: never attempt to reconnect when client detects that it has been -	blacklisted. Operations will generally fail after being blacklisted. +	  blacklisted. Operations will generally fail after being blacklisted.  	* clean: client reconnects to the ceph cluster automatically when it -	detects that it has been blacklisted. During reconnect, client drops -	dirty data/metadata, invalidates page caches and writable file handles. -	After reconnect, file locks become stale because the MDS loses track -	of them. If an inode contains any stale file locks, read/write on the -	inode is not allowed until applications release all stale file locks. +	  detects that it has been blacklisted. During reconnect, client drops +	  dirty data/metadata, invalidates page caches and writable file handles. +	  After reconnect, file locks become stale because the MDS loses track +	  of them. If an inode contains any stale file locks, read/write on the +	  inode is not allowed until applications release all stale file locks.  More Information  ================ @@ -179,8 +183,8 @@ For more information on Ceph, see the home page at  	https://ceph.com/  The Linux kernel client source tree is available at -	https://github.com/ceph/ceph-client.git -	git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git +	- https://github.com/ceph/ceph-client.git +	- git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git  and the source for the full system is at  	https://github.com/ceph/ceph.git diff --git a/Documentation/filesystems/cifs/cifsroot.txt b/Documentation/filesystems/cifs/cifsroot.txt index 0fa1a2c36a40..947b7ec6ce9e 100644 --- a/Documentation/filesystems/cifs/cifsroot.txt +++ b/Documentation/filesystems/cifs/cifsroot.txt @@ -13,7 +13,7 @@ network by utilizing SMB or CIFS protocol.  In order to mount, the network stack will also need to be set up by  using 'ip=' config option. For more details, see -Documentation/filesystems/nfs/nfsroot.txt. +Documentation/admin-guide/nfs/nfsroot.rst.  A CIFS root mount currently requires the use of SMB1+UNIX Extensions  which is only supported by the Samba server. SMB1 is the older diff --git a/Documentation/filesystems/cramfs.txt b/Documentation/filesystems/cramfs.rst index 8e19a53d648b..afbdbde98bd2 100644 --- a/Documentation/filesystems/cramfs.txt +++ b/Documentation/filesystems/cramfs.rst @@ -1,12 +1,15 @@ +.. SPDX-License-Identifier: GPL-2.0 -	Cramfs - cram a filesystem onto a small ROM +=========================================== +Cramfs - cram a filesystem onto a small ROM +=========================================== -cramfs is designed to be simple and small, and to compress things well.  +cramfs is designed to be simple and small, and to compress things well.  It uses the zlib routines to compress a file one page at a time, and  allows random page access.  The meta-data is not compressed, but is  expressed in a very terse representation to make it use much less -diskspace than traditional filesystems.  +diskspace than traditional filesystems.  You can't write to a cramfs filesystem (making it compressible and  compact also makes it _very_ hard to update on-the-fly), so you have to @@ -28,9 +31,9 @@ issue.  Hard links are supported, but hard linked files  will still have a link count of 1 in the cramfs image. -Cramfs directories have no `.' or `..' entries.  Directories (like +Cramfs directories have no ``.`` or ``..`` entries.  Directories (like  every other file on cramfs) always have a link count of 1.  (There's -no need to use -noleaf in `find', btw.) +no need to use -noleaf in ``find``, btw.)  No timestamps are stored in a cramfs, so these default to the epoch  (1970 GMT).  Recently-accessed files may have updated timestamps, but @@ -70,9 +73,9 @@ MTD drivers are cfi_cmdset_0001 (Intel/Sharp CFI flash) or physmap  (Flash device in physical memory map). MTD partitions based on such devices  are fine too. Then that device should be specified with the "mtd:" prefix  as the mount device argument. For example, to mount the MTD device named -"fs_partition" on the /mnt directory: +"fs_partition" on the /mnt directory:: -$ mount -t cramfs mtd:fs_partition /mnt +    $ mount -t cramfs mtd:fs_partition /mnt  To boot a kernel with this as root filesystem, suffice to specify  something like "root=mtd:fs_partition" on the kernel command line. @@ -90,6 +93,7 @@ https://github.com/npitre/cramfs-tools  For /usr/share/magic  -------------------- +=====	=======================	=======================  0	ulelong	0x28cd3d45	Linux cramfs offset 0  >4	ulelong	x		size %d  >8	ulelong	x		flags 0x%x @@ -110,6 +114,7 @@ For /usr/share/magic  >552	ulelong	x		fsid.blocks %d  >556	ulelong	x		fsid.files %d  >560	string	>\0		name "%.16s" +=====	=======================	=======================  Hacker Notes diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.rst index 55336a47a110..6c032db235a5 100644 --- a/Documentation/filesystems/debugfs.txt +++ b/Documentation/filesystems/debugfs.rst @@ -1,4 +1,11 @@ -Copyright 2009 Jonathan Corbet <[email protected]> +.. SPDX-License-Identifier: GPL-2.0 +.. include:: <isonum.txt> + +======= +DebugFS +======= + +Copyright |copy| 2009 Jonathan Corbet <[email protected]>  Debugfs exists as a simple way for kernel developers to make information  available to user space.  Unlike /proc, which is only meant for information @@ -6,11 +13,11 @@ about a process, or sysfs, which has strict one-value-per-file rules,  debugfs has no rules at all.  Developers can put any information they want  there.  The debugfs filesystem is also intended to not serve as a stable  ABI to user space; in theory, there are no stability constraints placed on -files exported there.  The real world is not always so simple, though [1]; +files exported there.  The real world is not always so simple, though [1]_;  even debugfs interfaces are best designed with the idea that they will need  to be maintained forever. -Debugfs is typically mounted with a command like: +Debugfs is typically mounted with a command like::      mount -t debugfs none /sys/kernel/debug @@ -23,7 +30,7 @@ Note that the debugfs API is exported GPL-only to modules.  Code using debugfs should include <linux/debugfs.h>.  Then, the first order  of business will be to create at least one directory to hold a set of -debugfs files: +debugfs files::      struct dentry *debugfs_create_dir(const char *name, struct dentry *parent); @@ -36,7 +43,7 @@ something went wrong.  If ERR_PTR(-ENODEV) is returned, that is an  indication that the kernel has been built without debugfs support and none  of the functions described below will work. -The most general way to create a file within a debugfs directory is with: +The most general way to create a file within a debugfs directory is with::      struct dentry *debugfs_create_file(const char *name, umode_t mode,  				       struct dentry *parent, void *data, @@ -53,12 +60,12 @@ ERR_PTR(-ERROR) on error, or ERR_PTR(-ENODEV) if debugfs support is  missing.  Create a file with an initial size, the following function can be used -instead: +instead:: -    struct dentry *debugfs_create_file_size(const char *name, umode_t mode, -				struct dentry *parent, void *data, -				const struct file_operations *fops, -				loff_t file_size); +    void debugfs_create_file_size(const char *name, umode_t mode, +				  struct dentry *parent, void *data, +				  const struct file_operations *fops, +				  loff_t file_size);  file_size is the initial file size. The other parameters are the same  as the function debugfs_create_file. @@ -66,21 +73,21 @@ as the function debugfs_create_file.  In a number of cases, the creation of a set of file operations is not  actually necessary; the debugfs code provides a number of helper functions  for simple situations.  Files containing a single integer value can be -created with any of: +created with any of::      void debugfs_create_u8(const char *name, umode_t mode,  			   struct dentry *parent, u8 *value);      void debugfs_create_u16(const char *name, umode_t mode,  			    struct dentry *parent, u16 *value); -    struct dentry *debugfs_create_u32(const char *name, umode_t mode, -				      struct dentry *parent, u32 *value); +    void debugfs_create_u32(const char *name, umode_t mode, +			    struct dentry *parent, u32 *value);      void debugfs_create_u64(const char *name, umode_t mode,  			    struct dentry *parent, u64 *value);  These files support both reading and writing the given value; if a specific  file should not be written to, simply set the mode bits accordingly.  The  values in these files are in decimal; if hexadecimal is more appropriate, -the following functions can be used instead: +the following functions can be used instead::      void debugfs_create_x8(const char *name, umode_t mode,  			   struct dentry *parent, u8 *value); @@ -94,7 +101,7 @@ the following functions can be used instead:  These functions are useful as long as the developer knows the size of the  value to be exported.  Some types can have different widths on different  architectures, though, complicating the situation somewhat.  There are -functions meant to help out in such special cases: +functions meant to help out in such special cases::      void debugfs_create_size_t(const char *name, umode_t mode,  			       struct dentry *parent, size_t *value); @@ -103,7 +110,7 @@ As might be expected, this function will create a debugfs file to represent  a variable of type size_t.  Similarly, there are helpers for variables of type unsigned long, in decimal -and hexadecimal: +and hexadecimal::      struct dentry *debugfs_create_ulong(const char *name, umode_t mode,  					struct dentry *parent, @@ -111,7 +118,7 @@ and hexadecimal:      void debugfs_create_xul(const char *name, umode_t mode,  			    struct dentry *parent, unsigned long *value); -Boolean values can be placed in debugfs with: +Boolean values can be placed in debugfs with::      struct dentry *debugfs_create_bool(const char *name, umode_t mode,  				       struct dentry *parent, bool *value); @@ -120,7 +127,7 @@ A read on the resulting file will yield either Y (for non-zero values) or  N, followed by a newline.  If written to, it will accept either upper- or  lower-case values, or 1 or 0.  Any other input will be silently ignored. -Also, atomic_t values can be placed in debugfs with: +Also, atomic_t values can be placed in debugfs with::      void debugfs_create_atomic_t(const char *name, umode_t mode,  				 struct dentry *parent, atomic_t *value) @@ -129,7 +136,7 @@ A read of this file will get atomic_t values, and a write of this file  will set atomic_t values.  Another option is exporting a block of arbitrary binary data, with -this structure and function: +this structure and function::      struct debugfs_blob_wrapper {  	void *data; @@ -151,7 +158,7 @@ If you want to dump a block of registers (something that happens quite  often during development, even if little such code reaches mainline.  Debugfs offers two functions: one to make a registers-only file, and  another to insert a register block in the middle of another sequential -file. +file::      struct debugfs_reg32 {  	char *name; @@ -175,7 +182,7 @@ The "base" argument may be 0, but you may want to build the reg32 array  using __stringify, and a number of register names (macros) are actually  byte offsets over a base for the register block. -If you want to dump an u32 array in debugfs, you can create file with: +If you want to dump an u32 array in debugfs, you can create file with::      void debugfs_create_u32_array(const char *name, umode_t mode,  			struct dentry *parent, @@ -185,7 +192,7 @@ The "array" argument provides data, and the "elements" argument is  the number of elements in the array. Note: Once array is created its  size can not be changed. -There is a helper function to create device related seq_file: +There is a helper function to create device related seq_file::     struct dentry *debugfs_create_devm_seqfile(struct device *dev,  				const char *name, @@ -197,14 +204,14 @@ The "dev" argument is the device related to this debugfs file, and  the "read_fn" is a function pointer which to be called to print the  seq_file content. -There are a couple of other directory-oriented helper functions: +There are a couple of other directory-oriented helper functions:: -    struct dentry *debugfs_rename(struct dentry *old_dir,  +    struct dentry *debugfs_rename(struct dentry *old_dir,      				  struct dentry *old_dentry, -		                  struct dentry *new_dir,  +		                  struct dentry *new_dir,  				  const char *new_name); -    struct dentry *debugfs_create_symlink(const char *name,  +    struct dentry *debugfs_create_symlink(const char *name,                                            struct dentry *parent,  				      	  const char *target); @@ -219,7 +226,7 @@ module is unloaded without explicitly removing debugfs entries, the result  will be a lot of stale pointers and no end of highly antisocial behavior.  So all debugfs users - at least those which can be built as modules - must  be prepared to remove all files and directories they create there.  A file -can be removed with: +can be removed with::      void debugfs_remove(struct dentry *dentry); @@ -229,7 +236,7 @@ be removed.  Once upon a time, debugfs users were required to remember the dentry  pointer for every debugfs file they created so that all files could be  cleaned up.  We live in more civilized times now, though, and debugfs users -can call: +can call::      void debugfs_remove_recursive(struct dentry *dentry); @@ -237,5 +244,4 @@ If this function is passed a pointer for the dentry corresponding to the  top-level directory, the entire hierarchy below that directory will be  removed. -Notes: -	[1] http://lwn.net/Articles/309298/ +.. [1] http://lwn.net/Articles/309298/ diff --git a/Documentation/filesystems/dlmfs.txt b/Documentation/filesystems/dlmfs.rst index fcf4d509d118..68daaa7facf9 100644 --- a/Documentation/filesystems/dlmfs.txt +++ b/Documentation/filesystems/dlmfs.rst @@ -1,20 +1,25 @@ -dlmfs -================== +.. SPDX-License-Identifier: GPL-2.0 +.. include:: <isonum.txt> + +===== +DLMFS +===== +  A minimal DLM userspace interface implemented via a virtual file  system.  dlmfs is built with OCFS2 as it requires most of its infrastructure. -Project web page:    http://ocfs2.wiki.kernel.org -Tools web page:      https://github.com/markfasheh/ocfs2-tools -OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/ +:Project web page:    http://ocfs2.wiki.kernel.org +:Tools web page:      https://github.com/markfasheh/ocfs2-tools +:OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/  All code copyright 2005 Oracle except when otherwise noted. -CREDITS +Credits  ======= -Some code taken from ramfs which is Copyright (C) 2000 Linus Torvalds +Some code taken from ramfs which is Copyright |copy| 2000 Linus Torvalds  and Transmeta Corp.  Mark Fasheh <[email protected]> @@ -96,14 +101,19 @@ operation. If the lock succeeds, you'll get an fd.  open(2) with O_CREAT to ensure the resource inode is created - dlmfs does  not automatically create inodes for existing lock resources. +============  ===========================  Open Flag     Lock Request Type ----------     ----------------- +============  ===========================  O_RDONLY      Shared Read  O_RDWR        Exclusive +============  =========================== + +============  ===========================  Open Flag     Resulting Locking Behavior ----------     -------------------------- +============  ===========================  O_NONBLOCK    Trylock operation +============  ===========================  You must provide exactly one of O_RDONLY or O_RDWR. diff --git a/Documentation/filesystems/ecryptfs.txt b/Documentation/filesystems/ecryptfs.rst index 01d8a08351ac..1f2edef4c57a 100644 --- a/Documentation/filesystems/ecryptfs.txt +++ b/Documentation/filesystems/ecryptfs.rst @@ -1,14 +1,18 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================================================  eCryptfs: A stacked cryptographic filesystem for Linux +======================================================  eCryptfs is free software. Please see the file COPYING for details.  For documentation, please see the files in the doc/ subdirectory.  For  building and installation instructions please see the INSTALL file. -Maintainer: Phillip Hellewell -Lead developer: Michael A. Halcrow <[email protected]> -Developers: Michael C. Thompson -            Kent Yoder -Web Site: http://ecryptfs.sf.net +:Maintainer: Phillip Hellewell +:Lead developer: Michael A. Halcrow <[email protected]> +:Developers: Michael C. Thompson +             Kent Yoder +:Web Site: http://ecryptfs.sf.net  This software is currently undergoing development. Make sure to  maintain a backup copy of any data you write into eCryptfs. @@ -19,34 +23,36 @@ SourceForge site:  http://sourceforge.net/projects/ecryptfs/  Userspace requirements include: - - David Howells' userspace keyring headers and libraries (version -   1.0 or higher), obtainable from -   http://people.redhat.com/~dhowells/keyutils/ - - Libgcrypt + +- David Howells' userspace keyring headers and libraries (version +  1.0 or higher), obtainable from +  http://people.redhat.com/~dhowells/keyutils/ +- Libgcrypt -NOTES +.. note:: -In the beta/experimental releases of eCryptfs, when you upgrade -eCryptfs, you should copy the files to an unencrypted location and -then copy the files back into the new eCryptfs mount to migrate the -files. +   In the beta/experimental releases of eCryptfs, when you upgrade +   eCryptfs, you should copy the files to an unencrypted location and +   then copy the files back into the new eCryptfs mount to migrate the +   files. -MOUNT-WIDE PASSPHRASE +Mount-wide Passphrase +=====================  Create a new directory into which eCryptfs will write its encrypted  files (i.e., /root/crypt).  Then, create the mount point directory -(i.e., /mnt/crypt).  Now it's time to mount eCryptfs: +(i.e., /mnt/crypt).  Now it's time to mount eCryptfs:: -mount -t ecryptfs /root/crypt /mnt/crypt +    mount -t ecryptfs /root/crypt /mnt/crypt  You should be prompted for a passphrase and a salt (the salt may be  blank). -Try writing a new file: +Try writing a new file:: -echo "Hello, World" > /mnt/crypt/hello.txt +    echo "Hello, World" > /mnt/crypt/hello.txt  The operation will complete.  Notice that there is a new file in  /root/crypt that is at least 12288 bytes in size (depending on your @@ -59,10 +65,13 @@ keyctl clear @u  Then umount /mnt/crypt and mount again per the instructions given  above. -cat /mnt/crypt/hello.txt +:: + +    cat /mnt/crypt/hello.txt -NOTES +Notes +=====  eCryptfs version 0.1 should only be mounted on (1) empty directories  or (2) directories containing files only created by eCryptfs. If you diff --git a/Documentation/filesystems/efivarfs.txt b/Documentation/filesystems/efivarfs.rst index 686a64bba775..90ac65683e7e 100644 --- a/Documentation/filesystems/efivarfs.txt +++ b/Documentation/filesystems/efivarfs.rst @@ -1,5 +1,8 @@ +.. SPDX-License-Identifier: GPL-2.0 +=======================================  efivarfs - a (U)EFI variable filesystem +=======================================  The efivarfs filesystem was created to address the shortcomings of  using entries in sysfs to maintain EFI variables. The old sysfs EFI @@ -11,7 +14,7 @@ than a single page, sysfs isn't the best interface for this.  Variables can be created, deleted and modified with the efivarfs  filesystem. -efivarfs is typically mounted like this, +efivarfs is typically mounted like this::  	mount -t efivarfs none /sys/firmware/efi/efivars diff --git a/Documentation/filesystems/erofs.txt b/Documentation/filesystems/erofs.rst index db6d39c3ae71..bf145171c2bf 100644 --- a/Documentation/filesystems/erofs.txt +++ b/Documentation/filesystems/erofs.rst @@ -1,3 +1,9 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====================================== +Enhanced Read-Only File System - EROFS +====================================== +  Overview  ======== @@ -6,6 +12,7 @@ from other read-only file systems, it aims to be designed for flexibility,  scalability, but be kept simple and high performance.  It is designed as a better filesystem solution for the following scenarios: +   - read-only storage media or   - part of a fully trusted read-only solution, which means it needs to be @@ -17,6 +24,7 @@ It is designed as a better filesystem solution for the following scenarios:     for those embedded devices with limited memory (ex, smartphone);  Here is the main features of EROFS: +   - Little endian on-disk design;   - Currently 4KB block size (nobh) and therefore maximum 16TB address space; @@ -24,13 +32,17 @@ Here is the main features of EROFS:   - Metadata & data could be mixed by design;   - 2 inode versions for different requirements: + +   =====================  ============  =====================================                            compact (v1)  extended (v2) -   Inode metadata size:   32 bytes      64 bytes -   Max file size:         4 GB          16 EB (also limited by max. vol size) -   Max uids/gids:         65536         4294967296 -   File change time:      no            yes (64 + 32-bit timestamp) -   Max hardlinks:         65536         4294967296 -   Metadata reserved:     4 bytes       14 bytes +   =====================  ============  ===================================== +   Inode metadata size    32 bytes      64 bytes +   Max file size          4 GB          16 EB (also limited by max. vol size) +   Max uids/gids          65536         4294967296 +   File change time       no            yes (64 + 32-bit timestamp) +   Max hardlinks          65536         4294967296 +   Metadata reserved      4 bytes       14 bytes +   =====================  ============  =====================================   - Support extended attributes (xattrs) as an option; @@ -43,29 +55,36 @@ Here is the main features of EROFS:  The following git tree provides the file system user-space tools under  development (ex, formatting tool mkfs.erofs): ->> git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git + +- git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git  Bugs and patches are welcome, please kindly help us and send to the following  linux-erofs mailing list: ->> linux-erofs mailing list   <[email protected]> + +- linux-erofs mailing list   <[email protected]>  Mount options  ============= +===================    =========================================================  (no)user_xattr         Setup Extended User Attributes. Note: xattr is enabled                         by default if CONFIG_EROFS_FS_XATTR is selected.  (no)acl                Setup POSIX Access Control List. Note: acl is enabled                         by default if CONFIG_EROFS_FS_POSIX_ACL is selected.  cache_strategy=%s      Select a strategy for cached decompression from now on: -                         disabled: In-place I/O decompression only; -                        readahead: Cache the last incomplete compressed physical + +		       ==========  ============================================= +                         disabled  In-place I/O decompression only; +                        readahead  Cache the last incomplete compressed physical                                     cluster for further reading. It still does                                     in-place I/O decompression for the rest                                     compressed physical clusters; -                       readaround: Cache the both ends of incomplete compressed +                       readaround  Cache the both ends of incomplete compressed                                     physical clusters for further reading.                                     It still does in-place I/O decompression                                     for the rest compressed physical clusters. +		       ==========  ============================================= +===================    =========================================================  On-disk details  =============== @@ -73,7 +92,7 @@ On-disk details  Summary  -------  Different from other read-only file systems, an EROFS volume is designed -to be as simple as possible: +to be as simple as possible::                                  |-> aligned with the block size     ____________________________________________________________ @@ -83,41 +102,45 @@ to be as simple as possible:  All data areas should be aligned with the block size, but metadata areas  may not. All metadatas can be now observed in two different spaces (views): +   1. Inode metadata space +      Each valid inode should be aligned with an inode slot, which is a fixed      value (32 bytes) and designed to be kept in line with compact inode size.      Each inode can be directly found with the following formula:           inode offset = meta_blkaddr * block_size + 32 * nid -                                |-> aligned with 8B -                                           |-> followed closely -    + meta_blkaddr blocks                                      |-> another slot -     _____________________________________________________________________ -    |  ...   | inode |  xattrs  | extents  | data inline | ... | inode ... -    |________|_______|(optional)|(optional)|__(optional)_|_____|__________ -             |-> aligned with the inode slot size -                  .                   . -                .                         . -              .                              . -            .                                    . -          .                                         . -        .                                              . -      .____________________________________________________|-> aligned with 4B -      | xattr_ibody_header | shared xattrs | inline xattrs | -      |____________________|_______________|_______________| -      |->    12 bytes    <-|->x * 4 bytes<-|               . -                          .                .                 . -                    .                      .                   . -               .                           .                     . -           ._______________________________.______________________. -           | id | id | id | id |  ... | id | ent | ... | ent| ... | -           |____|____|____|____|______|____|_____|_____|____|_____| -                                           |-> aligned with 4B -                                                       |-> aligned with 4B +    :: + +				    |-> aligned with 8B +					    |-> followed closely +	+ meta_blkaddr blocks                                      |-> another slot +	_____________________________________________________________________ +	|  ...   | inode |  xattrs  | extents  | data inline | ... | inode ... +	|________|_______|(optional)|(optional)|__(optional)_|_____|__________ +		|-> aligned with the inode slot size +		    .                   . +		    .                         . +		.                              . +		.                                    . +	    .                                         . +	    .                                              . +	.____________________________________________________|-> aligned with 4B +	| xattr_ibody_header | shared xattrs | inline xattrs | +	|____________________|_______________|_______________| +	|->    12 bytes    <-|->x * 4 bytes<-|               . +			    .                .                 . +			.                      .                   . +		.                           .                     . +	    ._______________________________.______________________. +	    | id | id | id | id |  ... | id | ent | ... | ent| ... | +	    |____|____|____|____|______|____|_____|_____|____|_____| +					    |-> aligned with 4B +							|-> aligned with 4B      Inode could be 32 or 64 bytes, which can be distinguished from a common -    field which all inode versions have -- i_format: +    field which all inode versions have -- i_format::          __________________               __________________         |     i_format     |             |     i_format     | @@ -132,16 +155,19 @@ may not. All metadatas can be now observed in two different spaces (views):      proper alignment, and they could be optional for different data mappings.      _currently_ total 4 valid data mappings are supported: +    ==  ====================================================================       0  flat file data without data inline (no extent);       1  fixed-sized output data compression (with non-compacted indexes);       2  flat file data with tail packing data inline (no extent);       3  fixed-sized output data compression (with compacted indexes, v5.3+). +    ==  ====================================================================      The size of the optional xattrs is indicated by i_xattr_count in inode      header. Large xattrs or xattrs shared by many different files can be      stored in shared xattrs metadata rather than inlined right after inode.   2. Shared xattrs metadata space +      Shared xattrs space is similar to the above inode space, started with      a specific block indicated by xattr_blkaddr, organized one by one with      proper align. @@ -149,11 +175,13 @@ may not. All metadatas can be now observed in two different spaces (views):      Each share xattr can also be directly found by the following formula:           xattr offset = xattr_blkaddr * block_size + 4 * xattr_id -                           |-> aligned by  4 bytes -    + xattr_blkaddr blocks                     |-> aligned with 4 bytes -     _________________________________________________________________________ -    |  ...   | xattr_entry |  xattr data | ... |  xattr_entry | xattr data  ... -    |________|_____________|_____________|_____|______________|_______________ +    :: + +			    |-> aligned by  4 bytes +	+ xattr_blkaddr blocks                     |-> aligned with 4 bytes +	_________________________________________________________________________ +	|  ...   | xattr_entry |  xattr data | ... |  xattr_entry | xattr data  ... +	|________|_____________|_____________|_____|______________|_______________  Directories  ----------- @@ -163,19 +191,21 @@ random file lookup, and all directory entries are _strictly_ recorded in  alphabetical order in order to support improved prefix binary search  algorithm (could refer to the related source code). -                 ___________________________ -                /                           | -               /              ______________|________________ -              /              /              | nameoff1       | nameoffN-1 - ____________.______________._______________v________________v__________ -| dirent | dirent | ... | dirent | filename | filename | ... | filename | -|___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____| -     \                           ^ -      \                          |                           * could have -       \                         |                             trailing '\0' -        \________________________| nameoff0 +:: + +		    ___________________________ +		    /                           | +		/              ______________|________________ +		/              /              | nameoff1       | nameoffN-1 +    ____________.______________._______________v________________v__________ +    | dirent | dirent | ... | dirent | filename | filename | ... | filename | +    |___.0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____| +	\                           ^ +	\                          |                           * could have +	\                         |                             trailing '\0' +	    \________________________| nameoff0 -                             Directory block +				Directory block  Note that apart from the offset of the first filename, nameoff0 also indicates  the total number of directory entries in this block since it is no need to @@ -184,28 +214,27 @@ introduce another on-disk field at all.  Compression  -----------  Currently, EROFS supports 4KB fixed-sized output transparent file compression, -as illustrated below: - -         |---- Variant-Length Extent ----|-------- VLE --------|----- VLE ----- -         clusterofs                      clusterofs            clusterofs -         |                               |                     |   logical data -_________v_______________________________v_____________________v_______________ -... |    .        |             |        .    |             |  .          | ... -____|____.________|_____________|________.____|_____________|__.__________|____ -    |-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-| -         size          size          size          size          size -          .                             .                .                   . -           .                       .               .                  . -            .                  .              .                . -      _______._____________._____________._____________._____________________ -         ... |             |             |             | ... physical data -      _______|_____________|_____________|_____________|_____________________ -             |-> cluster <-|-> cluster <-|-> cluster <-| -                  size          size          size +as illustrated below:: + +	    |---- Variant-Length Extent ----|-------- VLE --------|----- VLE ----- +	    clusterofs                      clusterofs            clusterofs +	    |                               |                     |   logical data +    _________v_______________________________v_____________________v_______________ +    ... |    .        |             |        .    |             |  .          | ... +    ____|____.________|_____________|________.____|_____________|__.__________|____ +	|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-|-> cluster <-| +	    size          size          size          size          size +	    .                             .                .                   . +	    .                       .               .                  . +		.                  .              .                . +	_______._____________._____________._____________._____________________ +	    ... |             |             |             | ... physical data +	_______|_____________|_____________|_____________|_____________________ +		|-> cluster <-|-> cluster <-|-> cluster <-| +		    size          size          size  Currently each on-disk physical cluster can contain 4KB (un)compressed data  at most. For each logical cluster, there is a corresponding on-disk index to  describe its cluster type, physical cluster address, etc.  See "struct z_erofs_vle_decompressed_index" in erofs_fs.h for more details. - diff --git a/Documentation/filesystems/ext2.txt b/Documentation/filesystems/ext2.rst index 94c2cf0292f5..d83dbbb162e2 100644 --- a/Documentation/filesystems/ext2.txt +++ b/Documentation/filesystems/ext2.rst @@ -1,3 +1,5 @@ +.. SPDX-License-Identifier: GPL-2.0 +  The Second Extended Filesystem  ============================== @@ -14,8 +16,9 @@ Options  Most defaults are determined by the filesystem superblock, and can be  set using tune2fs(8). Kernel-determined defaults are indicated by (*). -bsddf			(*)	Makes `df' act like BSD. -minixdf				Makes `df' act like Minix. +====================    ===     ================================================ +bsddf			(*)	Makes ``df`` act like BSD. +minixdf				Makes ``df`` act like Minix.  check=none, nocheck	(*)	Don't do extra checking of bitmaps on mount  				(check=normal and check=strict options removed) @@ -62,6 +65,7 @@ quota, usrquota			Enable user disk quota support  grpquota			Enable group disk quota support  				(requires CONFIG_QUOTA). +====================    ===     ================================================  noquota option ls silently ignored by ext2. @@ -294,9 +298,9 @@ respective fsck programs.  If you're exceptionally paranoid, there are 3 ways of making metadata  writes synchronous on ext2: -per-file if you have the program source: use the O_SYNC flag to open() -per-file if you don't have the source: use "chattr +S" on the file -per-filesystem: add the "sync" option to mount (or in /etc/fstab) +- per-file if you have the program source: use the O_SYNC flag to open() +- per-file if you don't have the source: use "chattr +S" on the file +- per-filesystem: add the "sync" option to mount (or in /etc/fstab)  the first and last are not ext2 specific but do force the metadata to  be written synchronously.  See also Journaling below. @@ -316,10 +320,12 @@ Most of these limits could be overcome with slight changes in the on-disk  format and using a compatibility flag to signal the format change (at  the expense of some compatibility). -Filesystem block size:     1kB        2kB        4kB        8kB - -File size limit:          16GB      256GB     2048GB     2048GB -Filesystem size limit:  2047GB     8192GB    16384GB    32768GB +=====================  =======    =======    =======   ======== +Filesystem block size      1kB        2kB        4kB        8kB +=====================  =======    =======    =======   ======== +File size limit           16GB      256GB     2048GB     2048GB +Filesystem size limit   2047GB     8192GB    16384GB    32768GB +=====================  =======    =======    =======   ========  There is a 2.4 kernel limit of 2048GB for a single block device, so no  filesystem larger than that can be created at this time.  There is also @@ -370,19 +376,24 @@ ext4 and journaling.  References  ========== +=======================	===============================================  The kernel source	file:/usr/src/linux/fs/ext2/  e2fsprogs (e2fsck)	http://e2fsprogs.sourceforge.net/  Design & Implementation	http://e2fsprogs.sourceforge.net/ext2intro.html  Journaling (ext3)	ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/  Filesystem Resizing	http://ext2resize.sourceforge.net/ -Compression (*)		http://e2compr.sourceforge.net/ +Compression [1]_	http://e2compr.sourceforge.net/ +=======================	===============================================  Implementations for: + +=======================	===========================================================  Windows 95/98/NT/2000	http://www.chrysocome.net/explore2fs -Windows 95 (*)		http://www.yipton.net/content.html#FSDEXT2 -DOS client (*)		ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/ -OS/2 (+)		ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/ +Windows 95 [1]_		http://www.yipton.net/content.html#FSDEXT2 +DOS client [1]_		ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/ +OS/2 [2]_		ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/  RISC OS client		http://www.esw-heim.tu-clausthal.de/~marco/smorbrod/IscaFS/ +=======================	=========================================================== -(*) no longer actively developed/supported (as of Apr 2001) -(+) no longer actively developed/supported (as of Mar 2009) +.. [1] no longer actively developed/supported (as of Apr 2001) +.. [2] no longer actively developed/supported (as of Mar 2009) diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.rst index 58758fbef9e0..c06cec3a8fdc 100644 --- a/Documentation/filesystems/ext3.txt +++ b/Documentation/filesystems/ext3.rst @@ -1,4 +1,6 @@ +.. SPDX-License-Identifier: GPL-2.0 +===============  Ext3 Filesystem  =============== diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.rst index 4eb3e2ddd00e..87d794bc75a4 100644 --- a/Documentation/filesystems/f2fs.txt +++ b/Documentation/filesystems/f2fs.rst @@ -1,6 +1,8 @@ -================================================================================ +.. SPDX-License-Identifier: GPL-2.0 + +==========================================  WHAT IS Flash-Friendly File System (F2FS)? -================================================================================ +==========================================  NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have  been equipped on a variety systems ranging from mobile to server systems. Since @@ -20,14 +22,15 @@ layout, but also for selecting allocation and cleaning algorithms.  The following git tree provides the file system formatting tool (mkfs.f2fs),  a consistency checking tool (fsck.f2fs), and a debugging tool (dump.f2fs). ->> git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git + +- git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git  For reporting bugs and sending patches, please use the following mailing list: -================================================================================ -BACKGROUND AND DESIGN ISSUES -================================================================================ + +Background and Design issues +============================  Log-structured File System (LFS)  -------------------------------- @@ -61,6 +64,7 @@ needs to reclaim these obsolete blocks seamlessly to users. This job is called  as a cleaning process.  The process consists of three operations as follows. +  1. A victim segment is selected through referencing segment usage table.  2. It loads parent index structures of all the data in the victim identified by     segment summary blocks. @@ -71,9 +75,8 @@ This cleaning job may cause unexpected long delays, so the most important goal  is to hide the latencies to users. And also definitely, it should reduce the  amount of valid data to be moved, and move them quickly as well. -================================================================================ -KEY FEATURES -================================================================================ +Key Features +============  Flash Awareness  --------------- @@ -94,10 +97,11 @@ Cleaning Overhead  - Support multi-head logs for static/dynamic hot and cold data separation  - Introduce adaptive logging for efficient block allocation -================================================================================ -MOUNT OPTIONS -================================================================================ +Mount Options +============= + +====================== ============================================================  background_gc=%s       Turn on/off cleaning operations, namely garbage                         collection, triggered in background when I/O subsystem is                         idle. If background_gc=on, it will turn on the garbage @@ -167,7 +171,10 @@ fault_injection=%d     Enable fault injection in all supported types with  fault_type=%d          Support configuring fault injection type, should be                         enabled with fault_injection option, fault type value                         is shown below, it supports single or combined type. + +                       ===================	===========                         Type_Name		Type_Value +                       ===================	===========                         FAULT_KMALLOC		0x000000001                         FAULT_KVMALLOC		0x000000002                         FAULT_PAGE_ALLOC		0x000000004 @@ -183,6 +190,7 @@ fault_type=%d          Support configuring fault injection type, should be                         FAULT_CHECKPOINT		0x000001000                         FAULT_DISCARD		0x000002000                         FAULT_WRITE_IO		0x000004000 +                       ===================	===========  mode=%s                Control block allocation mode which supports "adaptive"                         and "lfs". In "lfs" mode, there should be no random                         writes towards main area. @@ -219,7 +227,7 @@ fsync_mode=%s          Control the policy of fsync. Currently supports "posix",                         non-atomic files likewise "nobarrier" mount option.  test_dummy_encryption  Enable dummy encryption, which provides a fake fscrypt                         context. The fake fscrypt context is used by xfstests. -checkpoint=%s[:%u[%]]     Set to "disable" to turn off checkpointing. Set to "enable" +checkpoint=%s[:%u[%]]  Set to "disable" to turn off checkpointing. Set to "enable"                         to reenable checkpointing. Is enabled by default. While                         disabled, any unmounting or unexpected shutdowns will cause                         the filesystem contents to appear as they did when the @@ -235,8 +243,8 @@ checkpoint=%s[:%u[%]]     Set to "disable" to turn off checkpointing. Set to "en                         hide up to all remaining free space. The actual space that                         would be unusable can be viewed at /sys/fs/f2fs/<disk>/unusable                         This space is reclaimed once checkpoint=enable. -compress_algorithm=%s  Control compress algorithm, currently f2fs supports "lzo" -                       and "lz4" algorithm. +compress_algorithm=%s  Control compress algorithm, currently f2fs supports "lzo", +                       "lz4" and "zstd" algorithm.  compress_log_size=%u   Support configuring compress cluster size, the size will                         be 4KB * (1 << %u), 16KB is minimum size, also it's                         default size. @@ -246,22 +254,22 @@ compress_extension=%s  Support adding specified extension, so that f2fs can enab                         on compression extension list and enable compression on                         these file by default rather than to enable it via ioctl.                         For other files, we can still enable compression via ioctl. +====================== ============================================================ -================================================================================ -DEBUGFS ENTRIES -================================================================================ +Debugfs Entries +===============  /sys/kernel/debug/f2fs/ contains information about all the partitions mounted as  f2fs. Each file shows the whole f2fs information.  /sys/kernel/debug/f2fs/status includes: +   - major file system information managed by f2fs currently   - average SIT information about whole segments   - current memory footprint consumed by f2fs. -================================================================================ -SYSFS ENTRIES -================================================================================ +Sysfs Entries +=============  Information about mounted f2fs file systems can be found in  /sys/fs/f2fs.  Each mounted filesystem will have a directory in @@ -271,22 +279,24 @@ The files in each per-device directory are shown in table below.  Files in /sys/fs/f2fs/<devname>  (see also Documentation/ABI/testing/sysfs-fs-f2fs) -================================================================================ -USAGE -================================================================================ +Usage +=====  1. Download userland tools and compile them.  2. Skip, if f2fs was compiled statically inside kernel. -   Otherwise, insert the f2fs.ko module. - # insmod f2fs.ko +   Otherwise, insert the f2fs.ko module:: + +	# insmod f2fs.ko -3. Create a directory trying to mount - # mkdir /mnt/f2fs +3. Create a directory trying to mount:: -4. Format the block device, and then mount as f2fs - # mkfs.f2fs -l label /dev/block_device - # mount -t f2fs /dev/block_device /mnt/f2fs +	# mkdir /mnt/f2fs + +4. Format the block device, and then mount as f2fs:: + +	# mkfs.f2fs -l label /dev/block_device +	# mount -t f2fs /dev/block_device /mnt/f2fs  mkfs.f2fs  --------- @@ -294,18 +304,26 @@ The mkfs.f2fs is for the use of formatting a partition as the f2fs filesystem,  which builds a basic on-disk layout.  The options consist of: --l [label]   : Give a volume label, up to 512 unicode name. --a [0 or 1]  : Split start location of each area for heap-based allocation. -               1 is set by default, which performs this. --o [int]     : Set overprovision ratio in percent over volume size. -               5 is set by default. --s [int]     : Set the number of segments per section. -               1 is set by default. --z [int]     : Set the number of sections per zone. -               1 is set by default. --e [str]     : Set basic extension list. e.g. "mp3,gif,mov" --t [0 or 1]  : Disable discard command or not. -               1 is set by default, which conducts discard. + +===============    =========================================================== +``-l [label]``     Give a volume label, up to 512 unicode name. +``-a [0 or 1]``    Split start location of each area for heap-based allocation. + +                   1 is set by default, which performs this. +``-o [int]``       Set overprovision ratio in percent over volume size. + +                   5 is set by default. +``-s [int]``       Set the number of segments per section. + +                   1 is set by default. +``-z [int]``       Set the number of sections per zone. + +                   1 is set by default. +``-e [str]``       Set basic extension list. e.g. "mp3,gif,mov" +``-t [0 or 1]``    Disable discard command or not. + +                   1 is set by default, which conducts discard. +===============    ===========================================================  fsck.f2fs  --------- @@ -314,7 +332,8 @@ partition, which examines whether the filesystem metadata and user-made data  are cross-referenced correctly or not.  Note that, initial version of the tool does not fix any inconsistency. -The options consist of: +The options consist of:: +    -d debug level [default:0]  dump.f2fs @@ -327,20 +346,21 @@ It shows on-disk inode information recognized by a given inode number, and is  able to dump all the SSA and SIT entries into predefined files, ./dump_ssa and  ./dump_sit respectively. -The options consist of: +The options consist of:: +    -d debug level [default:0]    -i inode no (hex)    -s [SIT dump segno from #1~#2 (decimal), for all 0~-1]    -a [SSA dump segno from #1~#2 (decimal), for all 0~-1] -Examples: -# dump.f2fs -i [ino] /dev/sdx -# dump.f2fs -s 0~-1 /dev/sdx (SIT dump) -# dump.f2fs -a 0~-1 /dev/sdx (SSA dump) +Examples:: + +    # dump.f2fs -i [ino] /dev/sdx +    # dump.f2fs -s 0~-1 /dev/sdx (SIT dump) +    # dump.f2fs -a 0~-1 /dev/sdx (SSA dump) -================================================================================ -DESIGN -================================================================================ +Design +======  On-disk Layout  -------------- @@ -351,7 +371,7 @@ consists of a set of sections. By default, section and zone sizes are set to one  segment size identically, but users can easily modify the sizes by mkfs.  F2FS splits the entire volume into six areas, and all the areas except superblock -consists of multiple segments as described below. +consists of multiple segments as described below::                                              align with the zone size <-|                   |-> align with the segment size @@ -373,28 +393,28 @@ consists of multiple segments as described below.  	                            |__zone__|  - Superblock (SB) - : It is located at the beginning of the partition, and there exist two copies +   It is located at the beginning of the partition, and there exist two copies     to avoid file system crash. It contains basic partition information and some     default parameters of f2fs.  - Checkpoint (CP) - : It contains file system information, bitmaps for valid NAT/SIT sets, orphan +   It contains file system information, bitmaps for valid NAT/SIT sets, orphan     inode lists, and summary entries of current active segments.  - Segment Information Table (SIT) - : It contains segment information such as valid block count and bitmap for the +   It contains segment information such as valid block count and bitmap for the     validity of all the blocks.  - Node Address Table (NAT) - : It is composed of a block address table for all the node blocks stored in +   It is composed of a block address table for all the node blocks stored in     Main area.  - Segment Summary Area (SSA) - : It contains summary entries which contains the owner information of all the +   It contains summary entries which contains the owner information of all the     data and node blocks stored in Main area.  - Main Area - : It contains file and directory data including their indices. +   It contains file and directory data including their indices.  In order to avoid misalignment between file system and flash-based storage, F2FS  aligns the start block address of CP with the segment size. Also, it aligns the @@ -414,7 +434,7 @@ One of them always indicates the last valid data, which is called as shadow copy  mechanism. In addition to CP, NAT and SIT also adopt the shadow copy mechanism.  For file system consistency, each CP points to which NAT and SIT copies are -valid, as shown as below. +valid, as shown as below::    +--------+----------+---------+    |   CP   |    SIT   |   NAT   | @@ -438,7 +458,7 @@ indirect node. F2FS assigns 4KB to an inode block which contains 923 data block  indices, two direct node pointers, two indirect node pointers, and one double  indirect node pointer as described below. One direct node block contains 1018  data blocks, and one indirect node block contains also 1018 node blocks. Thus, -one inode block (i.e., a file) covers: +one inode block (i.e., a file) covers::    4KB * (923 + 2 * 1018 + 2 * 1018 * 1018 + 1018 * 1018 * 1018) := 3.94TB. @@ -473,6 +493,8 @@ A dentry block consists of 214 dentry slots and file names. Therein a bitmap is  used to represent whether each dentry is valid or not. A dentry block occupies  4KB with the following composition. +:: +    Dentry Block(4 K) = bitmap (27 bytes) + reserved (3 bytes) +  	              dentries(11 * 214 bytes) + file name (8 * 214 bytes) @@ -498,23 +520,25 @@ F2FS implements multi-level hash tables for directory structure. Each level has  a hash table with dedicated number of hash buckets as shown below. Note that  "A(2B)" means a bucket includes 2 data blocks. ----------------------- -A : bucket -B : block -N : MAX_DIR_HASH_DEPTH ----------------------- +:: + +    ---------------------- +    A : bucket +    B : block +    N : MAX_DIR_HASH_DEPTH +    ---------------------- -level #0   | A(2B) -           | -level #1   | A(2B) - A(2B) -           | -level #2   | A(2B) - A(2B) - A(2B) - A(2B) -     .     |   .       .       .       . -level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B) -     .     |   .       .       .       . -level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B) +    level #0   | A(2B) +	    | +    level #1   | A(2B) - A(2B) +	    | +    level #2   | A(2B) - A(2B) - A(2B) - A(2B) +	.     |   .       .       .       . +    level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B) +	.     |   .       .       .       . +    level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B) -The number of blocks and buckets are determined by, +The number of blocks and buckets are determined by::                              ,- 2, if n < MAX_DIR_HASH_DEPTH / 2,    # of blocks in level #n = | @@ -532,7 +556,7 @@ dentry consisting of the file name and its inode number. If not found, F2FS  scans the next hash table in level #1. In this way, F2FS scans hash tables in  each levels incrementally from 1 to N. In each levels F2FS needs to scan only  one bucket determined by the following equation, which shows O(log(# of files)) -complexity. +complexity::    bucket number to scan in level #n = (hash value) % (# of buckets in level #n) @@ -540,7 +564,8 @@ In the case of file creation, F2FS finds empty consecutive slots that cover the  file name. F2FS searches the empty slots in the hash tables of whole levels from  1 to N in the same way as the lookup operation. -The following figure shows an example of two cases holding children. +The following figure shows an example of two cases holding children:: +         --------------> Dir <--------------         |                                 |      child                             child @@ -611,14 +636,15 @@ Write-hint Policy  2) whint_mode=user-based. F2FS tries to pass down hints given by  users. +===================== ======================== ===================  User                  F2FS                     Block -----                  ----                     ----- +===================== ======================== ===================                        META                     WRITE_LIFE_NOT_SET                        HOT_NODE                 "                        WARM_NODE                "                        COLD_NODE                " -*ioctl(COLD)          COLD_DATA                WRITE_LIFE_EXTREME -*extension list       "                        " +ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME +extension list        "                        "  -- buffered io  WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME @@ -635,11 +661,13 @@ WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET  WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE  WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM  WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG +===================== ======================== ===================  3) whint_mode=fs-based. F2FS passes down hints with its policy. +===================== ======================== ===================  User                  F2FS                     Block -----                  ----                     ----- +===================== ======================== ===================                        META                     WRITE_LIFE_MEDIUM;                        HOT_NODE                 WRITE_LIFE_NOT_SET                        WARM_NODE                " @@ -662,6 +690,7 @@ WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET  WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE  WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM  WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG +===================== ======================== ===================  Fallocate(2) Policy  ------------------- @@ -681,6 +710,7 @@ Allocating disk space  However, once F2FS receives ioctl(fd, F2FS_IOC_SET_PIN_FILE) in prior to  fallocate(fd, DEFAULT_MODE), it allocates on-disk blocks addressess having  zero or random data, which is useful to the below scenario where: +   1. create(fd)   2. ioctl(fd, F2FS_IOC_SET_PIN_FILE)   3. fallocate(fd, 0, 0, size) @@ -692,39 +722,41 @@ Compression implementation  --------------------------  - New term named cluster is defined as basic unit of compression, file can -be divided into multiple clusters logically. One cluster includes 4 << n -(n >= 0) logical pages, compression size is also cluster size, each of -cluster can be compressed or not. +  be divided into multiple clusters logically. One cluster includes 4 << n +  (n >= 0) logical pages, compression size is also cluster size, each of +  cluster can be compressed or not.  - In cluster metadata layout, one special block address is used to indicate -cluster is compressed one or normal one, for compressed cluster, following -metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs -stores data including compress header and compressed data. +  cluster is compressed one or normal one, for compressed cluster, following +  metadata maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs +  stores data including compress header and compressed data.  - In order to eliminate write amplification during overwrite, F2FS only -support compression on write-once file, data can be compressed only when -all logical blocks in file are valid and cluster compress ratio is lower -than specified threshold. +  support compression on write-once file, data can be compressed only when +  all logical blocks in file are valid and cluster compress ratio is lower +  than specified threshold.  - To enable compression on regular inode, there are three ways: -* chattr +c file -* chattr +c dir; touch dir/file -* mount w/ -o compress_extension=ext; touch file.ext - -Compress metadata layout: -                             [Dnode Structure] -             +-----------------------------------------------+ -             | cluster 1 | cluster 2 | ......... | cluster N | -             +-----------------------------------------------+ -             .           .                       .           . -       .                       .                .                      . -  .         Compressed Cluster       .        .        Normal Cluster            . -+----------+---------+---------+---------+  +---------+---------+---------+---------+ -|compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | block 4 | -+----------+---------+---------+---------+  +---------+---------+---------+---------+ -           .                             . -         .                                           . -       .                                                           . -      +-------------+-------------+----------+----------------------------+ -      | data length | data chksum | reserved |      compressed data       | -      +-------------+-------------+----------+----------------------------+ + +  * chattr +c file +  * chattr +c dir; touch dir/file +  * mount w/ -o compress_extension=ext; touch file.ext + +Compress metadata layout:: + +				[Dnode Structure] +		+-----------------------------------------------+ +		| cluster 1 | cluster 2 | ......... | cluster N | +		+-----------------------------------------------+ +		.           .                       .           . +	.                       .                .                      . +    .         Compressed Cluster       .        .        Normal Cluster            . +    +----------+---------+---------+---------+  +---------+---------+---------+---------+ +    |compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | block 4 | +    +----------+---------+---------+---------+  +---------+---------+---------+---------+ +	    .                             . +	    .                                           . +	.                                                           . +	+-------------+-------------+----------+----------------------------+ +	| data length | data chksum | reserved |      compressed data       | +	+-------------+-------------+----------+----------------------------+ diff --git a/Documentation/filesystems/fiemap.txt b/Documentation/filesystems/fiemap.txt index f6d9c99103a4..ac87e6fda842 100644 --- a/Documentation/filesystems/fiemap.txt +++ b/Documentation/filesystems/fiemap.txt @@ -115,8 +115,10 @@ data. Note that the opposite is not true - it would be valid for  FIEMAP_EXTENT_NOT_ALIGNED to appear alone.  * FIEMAP_EXTENT_LAST -This is the last extent in the file. A mapping attempt past this -extent will return nothing. +This is generally the last extent in the file. A mapping attempt past +this extent may return nothing. Some implementations set this flag to +indicate this extent is the last one in the range queried by the user +(via fiemap->fm_length).  * FIEMAP_EXTENT_UNKNOWN  The location of this extent is currently unknown. This may indicate diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst index bd9932344804..aa072112cfff 100644 --- a/Documentation/filesystems/fscrypt.rst +++ b/Documentation/filesystems/fscrypt.rst @@ -633,6 +633,17 @@ from a passphrase or other low-entropy user credential.  FS_IOC_GET_ENCRYPTION_PWSALT is deprecated.  Instead, prefer to  generate and manage any needed salt(s) in userspace. +Getting a file's encryption nonce +--------------------------------- + +Since Linux v5.7, the ioctl FS_IOC_GET_ENCRYPTION_NONCE is supported. +On encrypted files and directories it gets the inode's 16-byte nonce. +On unencrypted files and directories, it fails with ENODATA. + +This ioctl can be useful for automated tests which verify that the +encryption is being done correctly.  It is not needed for normal use +of fscrypt. +  Adding keys  ----------- diff --git a/Documentation/filesystems/fuse.rst b/Documentation/filesystems/fuse.rst index 8e455065ce9e..cd717f9bf940 100644 --- a/Documentation/filesystems/fuse.rst +++ b/Documentation/filesystems/fuse.rst @@ -1,7 +1,8 @@  .. SPDX-License-Identifier: GPL-2.0 -============== + +====  FUSE -============== +====  Definitions  =========== diff --git a/Documentation/filesystems/gfs2-uevents.txt b/Documentation/filesystems/gfs2-uevents.rst index 19a19ebebc34..f162a2c76c69 100644 --- a/Documentation/filesystems/gfs2-uevents.txt +++ b/Documentation/filesystems/gfs2-uevents.rst @@ -1,14 +1,18 @@ -                              uevents and GFS2 -                             ================== +.. SPDX-License-Identifier: GPL-2.0 + +================ +uevents and GFS2 +================  During the lifetime of a GFS2 mount, a number of uevents are generated.  This document explains what the events are and what they are used  for (by gfs_controld in gfs2-utils).  A list of GFS2 uevents ------------------------ +======================  1. ADD +------  The ADD event occurs at mount time. It will always be the first  uevent generated by the newly created filesystem. If the mount @@ -21,6 +25,7 @@ with no journal assigned), and read-only (with journal assigned) status  of the filesystem respectively.  2. ONLINE +---------  The ONLINE uevent is generated after a successful mount or remount. It  has the same environment variables as the ADD uevent. The ONLINE @@ -29,6 +34,7 @@ RDONLY are a relatively recent addition (2.6.32-rc+) and will not  be generated by older kernels.  3. CHANGE +---------  The CHANGE uevent is used in two places. One is when reporting the  successful mount of the filesystem by the first node (FIRSTMOUNT=Done). @@ -52,6 +58,7 @@ cluster. For this reason the ONLINE uevent was used when adding a new  uevent for a successful mount or remount.  4. OFFLINE +----------  The OFFLINE uevent is only generated due to filesystem errors and is used  as part of the "withdraw" mechanism. Currently this doesn't give any @@ -59,6 +66,7 @@ information about what the error is, which is something that needs to  be fixed.  5. REMOVE +---------  The REMOVE uevent is generated at the end of an unsuccessful mount  or at the end of a umount of the filesystem. All REMOVE uevents will @@ -68,9 +76,10 @@ kobject subsystem.  Information common to all GFS2 uevents (uevent environment variables) ----------------------------------------------------------------------- +=====================================================================  1. LOCKTABLE= +--------------  The LOCKTABLE is a string, as supplied on the mount command  line (locktable=) or via fstab. It is used as a filesystem label @@ -78,6 +87,7 @@ as well as providing the information for a lock_dlm mount to be  able to join the cluster.  2. LOCKPROTO= +-------------  The LOCKPROTO is a string, and its value depends on what is set  on the mount command line, or via fstab. It will be either @@ -85,12 +95,14 @@ lock_nolock or lock_dlm. In the future other lock managers  may be supported.  3. JOURNALID= +-------------  If a journal is in use by the filesystem (journals are not  assigned for spectator mounts) then this will give the  numeric journal id in all GFS2 uevents.  4. UUID= +--------  With recent versions of gfs2-utils, mkfs.gfs2 writes a UUID  into the filesystem superblock. If it exists, this will diff --git a/Documentation/filesystems/gfs2.txt b/Documentation/filesystems/gfs2.rst index cc4f2306609e..8d1ab589ce18 100644 --- a/Documentation/filesystems/gfs2.txt +++ b/Documentation/filesystems/gfs2.rst @@ -1,5 +1,8 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==================  Global File System ------------------- +==================  https://fedorahosted.org/cluster/wiki/HomePage @@ -14,16 +17,18 @@ on one machine show up immediately on all other machines in the cluster.  GFS uses interchangeable inter-node locking mechanisms, the currently  supported mechanisms are: -  lock_nolock -- allows gfs to be used as a local file system +  lock_nolock +    - allows gfs to be used as a local file system -  lock_dlm -- uses a distributed lock manager (dlm) for inter-node locking -  The dlm is found at linux/fs/dlm/ +  lock_dlm +    - uses a distributed lock manager (dlm) for inter-node locking. +      The dlm is found at linux/fs/dlm/  Lock_dlm depends on user space cluster management systems found  at the URL above.  To use gfs as a local file system, no external clustering systems are -needed, simply: +needed, simply::    $ mkfs -t gfs2 -p lock_nolock -j 1 /dev/block_device    $ mount -t gfs2 /dev/block_device /dir @@ -37,9 +42,12 @@ GFS2 is not on-disk compatible with previous versions of GFS, but it  is pretty close.  The following man pages can be found at the URL above: + +  ============		=============================================    fsck.gfs2		to repair a filesystem    gfs2_grow		to expand a filesystem online    gfs2_jadd		to add journals to a filesystem online    tunegfs2		to manipulate, examine and tune a filesystem -  gfs2_convert	to convert a gfs filesystem to gfs2 in-place +  gfs2_convert		to convert a gfs filesystem to gfs2 in-place    mkfs.gfs2		to make a filesystem +  ============		============================================= diff --git a/Documentation/filesystems/hfs.txt b/Documentation/filesystems/hfs.rst index d096df6db07a..ab17a005e9b1 100644 --- a/Documentation/filesystems/hfs.txt +++ b/Documentation/filesystems/hfs.rst @@ -1,11 +1,16 @@ -Note: This filesystem doesn't have a maintainer. +.. SPDX-License-Identifier: GPL-2.0 +==================================  Macintosh HFS Filesystem for Linux  ================================== -HFS stands for ``Hierarchical File System'' and is the filesystem used + +.. Note:: This filesystem doesn't have a maintainer. + + +HFS stands for ``Hierarchical File System`` and is the filesystem used  by the Mac Plus and all later Macintosh models.  Earlier Macintosh -models used MFS (``Macintosh File System''), which is not supported, +models used MFS (``Macintosh File System``), which is not supported,  MacOS 8.1 and newer support a filesystem called HFS+ that's similar to  HFS but is extended in various areas.  Use the hfsplus filesystem driver  to access such filesystems from Linux. @@ -49,25 +54,25 @@ Writing to HFS Filesystems  HFS is not a UNIX filesystem, thus it does not have the usual features you'd  expect: - o You can't modify the set-uid, set-gid, sticky or executable bits or the uid + * You can't modify the set-uid, set-gid, sticky or executable bits or the uid     and gid of files. - o You can't create hard- or symlinks, device files, sockets or FIFOs. + * You can't create hard- or symlinks, device files, sockets or FIFOs.  HFS does on the other have the concepts of multiple forks per file.  These  non-standard forks are represented as hidden additional files in the normal  filesystems namespace which is kind of a cludge and makes the semantics for  the a little strange: - o You can't create, delete or rename resource forks of files or the + * You can't create, delete or rename resource forks of files or the     Finder's metadata. - o They are however created (with default values), deleted and renamed + * They are however created (with default values), deleted and renamed     along with the corresponding data fork or directory. - o Copying files to a different filesystem will loose those attributes + * Copying files to a different filesystem will loose those attributes     that are essential for MacOS to work.  Creating HFS filesystems -=================================== +========================  The hfsutils package from Robert Leslie contains a program called  hformat that can be used to create HFS filesystem. See diff --git a/Documentation/filesystems/hfsplus.txt b/Documentation/filesystems/hfsplus.rst index 59f7569fc9ed..f02f4f5fc020 100644 --- a/Documentation/filesystems/hfsplus.txt +++ b/Documentation/filesystems/hfsplus.rst @@ -1,4 +1,6 @@ +.. SPDX-License-Identifier: GPL-2.0 +======================================  Macintosh HFSPlus Filesystem for Linux  ====================================== diff --git a/Documentation/filesystems/hpfs.txt b/Documentation/filesystems/hpfs.rst index 74630bd504fb..0db152278572 100644 --- a/Documentation/filesystems/hpfs.txt +++ b/Documentation/filesystems/hpfs.rst @@ -1,13 +1,21 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====================  Read/Write HPFS 2.09 +==================== +  1998-2004, Mikulas Patocka -email: [email protected] -homepage: http://artax.karlin.mff.cuni.cz/~mikulas/vyplody/hpfs/index-e.cgi +:email: [email protected] +:homepage: http://artax.karlin.mff.cuni.cz/~mikulas/vyplody/hpfs/index-e.cgi -CREDITS: +Credits +=======  Chris Smith, 1993, original read-only HPFS, some code and hpfs structures file  	is taken from it +  Jacques Gelinas, MSDos mmap, Inspired by fs/nfs/mmap.c (Jon Tombs 15 Aug 1993) +  Werner Almesberger, 1992, 1993, MSDos option parser & CR/LF conversion  Mount options @@ -50,6 +58,7 @@ timeshift=(-)nnn (default 0)  File names +==========  As in OS/2, filenames are case insensitive. However, shell thinks that names  are case sensitive, so for example when you create a file FOO, you can use @@ -64,6 +73,7 @@ access it under names 'a.', 'a..', 'a .  . . ' etc.  Extended attributes +===================  On HPFS partitions, OS/2 can associate to each file a special information called  extended attributes. Extended attributes are pairs of (key,value) where key is @@ -88,6 +98,7 @@ values doesn't work.  Symlinks +========  You can do symlinks on HPFS partition, symlinks are achieved by setting extended  attribute named "SYMLINK" with symlink value. Like on ext2, you can chown and @@ -101,6 +112,7 @@ to analyze or change OS2SYS.INI.  Codepages +=========  HPFS can contain several uppercasing tables for several codepages and each  file has a pointer to codepage its name is in. However OS/2 was created in @@ -128,6 +140,7 @@ this codepage - if you don't try to do what I described above :-)  Known bugs +==========  HPFS386 on OS/2 server is not supported. HPFS386 installed on normal OS/2 client  should work. If you have OS/2 server, use only read-only mode. I don't know how @@ -152,7 +165,8 @@ would result in directory tree splitting, that takes disk space. Workaround is  to delete other files that are leaf (probability that the file is non-leaf is  about 1/50) or to truncate file first to make some space.  You encounter this problem only if you have many directories so that -preallocated directory band is full i.e. +preallocated directory band is full i.e.:: +  	number_of_directories / size_of_filesystem_in_mb > 4.  You can't delete open directories. @@ -174,6 +188,7 @@ anybody know what does it mean?  What does "unbalanced tree" message mean? +=========================================  Old versions of this driver created sometimes unbalanced dnode trees. OS/2  chkdsk doesn't scream if the tree is unbalanced (and sometimes creates @@ -187,6 +202,7 @@ whole created by this driver, it is BUG - let me know about it.  Bugs in OS/2 +============  When you have two (or more) lost directories pointing each to other, chkdsk  locks up when repairing filesystem. @@ -199,98 +215,139 @@ File names like "a .b" are marked as 'long' by OS/2 but chkdsk "corrects" it and  marks them as short (and writes "minor fs error corrected"). This bug is not in  HPFS386. -Codepage bugs described above. +Codepage bugs described above +=============================  If you don't install fixpacks, there are many, many more...  History +======= + +====== ========================================================================= +0.90   First public release +0.91   Fixed bug that caused shooting to memory when write_inode was called on +       open inode (rarely happened) +0.92   Fixed a little memory leak in freeing directory inodes +0.93   Fixed bug that locked up the machine when there were too many filenames +       with first 15 characters same +       Fixed write_file to zero file when writing behind file end +0.94   Fixed a little memory leak when trying to delete busy file or directory +0.95   Fixed a bug that i_hpfs_parent_dir was not updated when moving files +1.90   First version for 2.1.1xx kernels +1.91   Fixed a bug that chk_sectors failed when sectors were at the end of disk +       Fixed a race-condition when write_inode is called while deleting file +       Fixed a bug that could possibly happen (with very low probability) when +       using 0xff in filenames. + +       Rewritten locking to avoid race-conditions + +       Mount option 'eas' now works + +       Fsync no longer returns error + +       Files beginning with '.' are marked hidden + +       Remount support added + +       Alloc is not so slow when filesystem becomes full + +       Atimes are no more updated because it slows down operation + +       Code cleanup (removed all commented debug prints) +1.92   Corrected a bug when sync was called just before closing file +1.93   Modified, so that it works with kernels >= 2.1.131, I don't know if it +       works with previous versions + +       Fixed a possible problem with disks > 64G (but I don't have one, so I can't +       test it) + +       Fixed a file overflow at 2G + +       Added new option 'timeshift' + +       Changed behaviour on HPFS386: It is now possible to operate on HPFS386 in +       read-only mode + +       Fixed a bug that slowed down alloc and prevented allocating 100% space +       (this bug was not destructive) +1.94   Added workaround for one bug in Linux + +       Fixed one buffer leak + +       Fixed some incompatibilities with large extended attributes (but it's still +       not 100% ok, I have no info on it and OS/2 doesn't want to create them) + +       Rewritten allocation -0.90 First public release -0.91 Fixed bug that caused shooting to memory when write_inode was called on -	open inode (rarely happened) -0.92 Fixed a little memory leak in freeing directory inodes -0.93 Fixed bug that locked up the machine when there were too many filenames -	with first 15 characters same -     Fixed write_file to zero file when writing behind file end -0.94 Fixed a little memory leak when trying to delete busy file or directory -0.95 Fixed a bug that i_hpfs_parent_dir was not updated when moving files -1.90 First version for 2.1.1xx kernels -1.91 Fixed a bug that chk_sectors failed when sectors were at the end of disk -     Fixed a race-condition when write_inode is called while deleting file -     Fixed a bug that could possibly happen (with very low probability) when -     	using 0xff in filenames -     Rewritten locking to avoid race-conditions -     Mount option 'eas' now works -     Fsync no longer returns error -     Files beginning with '.' are marked hidden -     Remount support added -     Alloc is not so slow when filesystem becomes full -     Atimes are no more updated because it slows down operation -     Code cleanup (removed all commented debug prints) -1.92 Corrected a bug when sync was called just before closing file -1.93 Modified, so that it works with kernels >= 2.1.131, I don't know if it -	works with previous versions -     Fixed a possible problem with disks > 64G (but I don't have one, so I can't -     	test it) -     Fixed a file overflow at 2G -     Added new option 'timeshift' -     Changed behaviour on HPFS386: It is now possible to operate on HPFS386 in -     	read-only mode -     Fixed a bug that slowed down alloc and prevented allocating 100% space -     	(this bug was not destructive) -1.94 Added workaround for one bug in Linux -     Fixed one buffer leak -     Fixed some incompatibilities with large extended attributes (but it's still -	not 100% ok, I have no info on it and OS/2 doesn't want to create them) -     Rewritten allocation -     Fixed a bug with i_blocks (du sometimes didn't display correct values) -     Directories have no longer archive attribute set (some programs don't like -	it) -     Fixed a bug that it set badly one flag in large anode tree (it was not -	destructive) -1.95 Fixed one buffer leak, that could happen on corrupted filesystem -     Fixed one bug in allocation in 1.94 -1.96 Added workaround for one bug in OS/2 (HPFS locked up, HPFS386 reported -	error sometimes when opening directories in PMSHELL) -     Fixed a possible bitmap race -     Fixed possible problem on large disks -     You can now delete open files -     Fixed a nondestructive race in rename -1.97 Support for HPFS v3 (on large partitions) -     Fixed a bug that it didn't allow creation of files > 128M (it should be 2G) +       Fixed a bug with i_blocks (du sometimes didn't display correct values) + +       Directories have no longer archive attribute set (some programs don't like +       it) + +       Fixed a bug that it set badly one flag in large anode tree (it was not +       destructive) +1.95   Fixed one buffer leak, that could happen on corrupted filesystem + +       Fixed one bug in allocation in 1.94 +1.96   Added workaround for one bug in OS/2 (HPFS locked up, HPFS386 reported +       error sometimes when opening directories in PMSHELL) + +       Fixed a possible bitmap race + +       Fixed possible problem on large disks + +       You can now delete open files + +       Fixed a nondestructive race in rename +1.97   Support for HPFS v3 (on large partitions) + +       ZFixed a bug that it didn't allow creation of files > 128M +       (it should be 2G)  1.97.1 Changed names of global symbols +         Fixed a bug when chmoding or chowning root directory -1.98 Fixed a deadlock when using old_readdir -     Better directory handling; workaround for "unbalanced tree" bug in OS/2 -1.99 Corrected a possible problem when there's not enough space while deleting -	file -     Now it tries to truncate the file if there's not enough space when deleting -     Removed a lot of redundant code -2.00 Fixed a bug in rename (it was there since 1.96) -     Better anti-fragmentation strategy -2.01 Fixed problem with directory listing over NFS -     Directory lseek now checks for proper parameters -     Fixed race-condition in buffer code - it is in all filesystems in Linux; -        when reading device (cat /dev/hda) while creating files on it, files -        could be damaged -2.02 Workaround for bug in breada in Linux. breada could cause accesses beyond -        end of partition -2.03 Char, block devices and pipes are correctly created -     Fixed non-crashing race in unlink (Alexander Viro) -     Now it works with Japanese version of OS/2 -2.04 Fixed error when ftruncate used to extend file -2.05 Fixed crash when got mount parameters without = -     Fixed crash when allocation of anode failed due to full disk -     Fixed some crashes when block io or inode allocation failed -2.06 Fixed some crash on corrupted disk structures -     Better allocation strategy -     Reschedule points added so that it doesn't lock CPU long time -     It should work in read-only mode on Warp Server -2.07 More fixes for Warp Server. Now it really works -2.08 Creating new files is not so slow on large disks -     An attempt to sync deleted file does not generate filesystem error -2.09 Fixed error on extremely fragmented files - - - vim: set textwidth=80: +1.98   Fixed a deadlock when using old_readdir +       Better directory handling; workaround for "unbalanced tree" bug in OS/2 +1.99   Corrected a possible problem when there's not enough space while deleting +       file + +       Now it tries to truncate the file if there's not enough space when +       deleting + +       Removed a lot of redundant code +2.00   Fixed a bug in rename (it was there since 1.96) +       Better anti-fragmentation strategy +2.01   Fixed problem with directory listing over NFS + +       Directory lseek now checks for proper parameters + +       Fixed race-condition in buffer code - it is in all filesystems in Linux; +       when reading device (cat /dev/hda) while creating files on it, files +       could be damaged +2.02   Workaround for bug in breada in Linux. breada could cause accesses beyond +       end of partition +2.03   Char, block devices and pipes are correctly created + +       Fixed non-crashing race in unlink (Alexander Viro) + +       Now it works with Japanese version of OS/2 +2.04   Fixed error when ftruncate used to extend file +2.05   Fixed crash when got mount parameters without = + +       Fixed crash when allocation of anode failed due to full disk + +       Fixed some crashes when block io or inode allocation failed +2.06   Fixed some crash on corrupted disk structures + +       Better allocation strategy + +       Reschedule points added so that it doesn't lock CPU long time + +       It should work in read-only mode on Warp Server +2.07   More fixes for Warp Server. Now it really works +2.08   Creating new files is not so slow on large disks + +       An attempt to sync deleted file does not generate filesystem error +2.09   Fixed error on extremely fragmented files +====== ========================================================================= diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index 386eaad008b2..e7b46dac7079 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -1,3 +1,5 @@ +.. _filesystems_index: +  ===============================  Filesystems in the Linux kernel  =============================== @@ -46,8 +48,53 @@ Documentation for filesystem implementations.  .. toctree::     :maxdepth: 2 +   9p +   adfs +   affs +   afs     autofs +   autofs-mount-control +   befs +   bfs +   btrfs +   ceph +   cramfs +   debugfs +   dlmfs +   ecryptfs +   efivarfs +   erofs +   ext2 +   ext3 +   f2fs +   gfs2 +   gfs2-uevents +   hfs +   hfsplus +   hpfs     fuse +   inotify +   isofs +   nilfs2 +   nfs/index +   ntfs +   ocfs2 +   ocfs2-online-filecheck +   omfs +   orangefs     overlayfs +   proc +   qnx6 +   ramfs-rootfs-initramfs +   relay +   romfs +   squashfs +   sysfs +   sysv-fs +   tmpfs +   ubifs +   ubifs-authentication.rst +   udf     virtiofs     vfat +   zonefs diff --git a/Documentation/filesystems/inotify.txt b/Documentation/filesystems/inotify.rst index 51f61db787fb..7f7ef8af0e1e 100644 --- a/Documentation/filesystems/inotify.txt +++ b/Documentation/filesystems/inotify.rst @@ -1,27 +1,36 @@ -				   inotify -	    a powerful yet simple file change notification system +.. SPDX-License-Identifier: GPL-2.0 + +=============================================================== +Inotify - A Powerful yet Simple File Change Notification System +===============================================================  Document started 15 Mar 2005 by Robert Love <[email protected]> +  Document updated 4 Jan 2015 by Zhang Zhen <[email protected]> -	--Deleted obsoleted interface, just refer to manpages for user interface. + +	- Deleted obsoleted interface, just refer to manpages for user interface.  (i) Rationale -Q: What is the design decision behind not tying the watch to the open fd of +Q: +   What is the design decision behind not tying the watch to the open fd of     the watched object? -A: Watches are associated with an open inotify device, not an open file. +A: +   Watches are associated with an open inotify device, not an open file.     This solves the primary problem with dnotify: keeping the file open pins     the file and thus, worse, pins the mount.  Dnotify is therefore infeasible     for use on a desktop system with removable media as the media cannot be     unmounted.  Watching a file should not require that it be open. -Q: What is the design decision behind using an-fd-per-instance as opposed to +Q: +   What is the design decision behind using an-fd-per-instance as opposed to     an fd-per-watch? -A: An fd-per-watch quickly consumes more file descriptors than are allowed, +A: +   An fd-per-watch quickly consumes more file descriptors than are allowed,     more fd's than are feasible to manage, and more fd's than are optimally     select()-able.  Yes, root can bump the per-process fd limit and yes, users     can use epoll, but requiring both is a silly and extraneous requirement. @@ -29,8 +38,8 @@ A: An fd-per-watch quickly consumes more file descriptors than are allowed,     spaces is thus sensible.  The current design is what user-space developers     want: Users initialize inotify, once, and add n watches, requiring but one     fd and no twiddling with fd limits.  Initializing an inotify instance two -   thousand times is silly.  If we can implement user-space's preferences  -   cleanly--and we can, the idr layer makes stuff like this trivial--then we  +   thousand times is silly.  If we can implement user-space's preferences +   cleanly--and we can, the idr layer makes stuff like this trivial--then we     should.     There are other good arguments.  With a single fd, there is a single @@ -65,9 +74,11 @@ A: An fd-per-watch quickly consumes more file descriptors than are allowed,     need not be a one-fd-per-process mapping; it is one-fd-per-queue and a     process can easily want more than one queue. -Q: Why the system call approach? +Q: +   Why the system call approach? -A: The poor user-space interface is the second biggest problem with dnotify. +A: +   The poor user-space interface is the second biggest problem with dnotify.     Signals are a terrible, terrible interface for file notification.  Or for     anything, for that matter.  The ideal solution, from all perspectives, is a     file descriptor-based one that allows basic file I/O and poll/select. diff --git a/Documentation/filesystems/isofs.rst b/Documentation/filesystems/isofs.rst new file mode 100644 index 000000000000..08fd469091d4 --- /dev/null +++ b/Documentation/filesystems/isofs.rst @@ -0,0 +1,64 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================== +ISO9660 Filesystem +================== + +Mount options that are the same as for msdos and vfat partitions. + +  =========	======================================================== +  gid=nnn	All files in the partition will be in group nnn. +  uid=nnn	All files in the partition will be owned by user id nnn. +  umask=nnn	The permission mask (see umask(1)) for the partition. +  =========	======================================================== + +Mount options that are the same as vfat partitions. These are only useful +when using discs encoded using Microsoft's Joliet extensions. + + ==============	============================================================= + iocharset=name Character set to use for converting from Unicode to +		ASCII.  Joliet filenames are stored in Unicode format, but +		Unix for the most part doesn't know how to deal with Unicode. +		There is also an option of doing UTF-8 translations with the +		utf8 option. +  utf8          Encode Unicode names in UTF-8 format. Default is no. + ==============	============================================================= + +Mount options unique to the isofs filesystem. + + ================= ============================================================ +  block=512        Set the block size for the disk to 512 bytes +  block=1024       Set the block size for the disk to 1024 bytes +  block=2048       Set the block size for the disk to 2048 bytes +  check=relaxed    Matches filenames with different cases +  check=strict     Matches only filenames with the exact same case +  cruft            Try to handle badly formatted CDs. +  map=off          Do not map non-Rock Ridge filenames to lower case +  map=normal       Map non-Rock Ridge filenames to lower case +  map=acorn        As map=normal but also apply Acorn extensions if present +  mode=xxx         Sets the permissions on files to xxx unless Rock Ridge +		   extensions set the permissions otherwise +  dmode=xxx        Sets the permissions on directories to xxx unless Rock Ridge +		   extensions set the permissions otherwise +  overriderockperm Set permissions on files and directories according to +		   'mode' and 'dmode' even though Rock Ridge extensions are +		   present. +  nojoliet         Ignore Joliet extensions if they are present. +  norock           Ignore Rock Ridge extensions if they are present. +  hide		   Completely strip hidden files from the file system. +  showassoc	   Show files marked with the 'associated' bit +  unhide	   Deprecated; showing hidden files is now default; +		   If given, it is a synonym for 'showassoc' which will +		   recreate previous unhide behavior +  session=x        Select number of session on multisession CD +  sbsector=xxx     Session begins from sector xxx + ================= ============================================================ + +Recommended documents about ISO 9660 standard are located at: + +- http://www.y-adagio.com/ +- ftp://ftp.ecma.ch/ecma-st/Ecma-119.pdf + +Quoting from the PDF "This 2nd Edition of Standard ECMA-119 is technically +identical with ISO 9660.", so it is a valid and gratis substitute of the +official ISO specification. diff --git a/Documentation/filesystems/isofs.txt b/Documentation/filesystems/isofs.txt deleted file mode 100644 index ba0a93384de0..000000000000 --- a/Documentation/filesystems/isofs.txt +++ /dev/null @@ -1,48 +0,0 @@ -Mount options that are the same as for msdos and vfat partitions. - -  gid=nnn	All files in the partition will be in group nnn. -  uid=nnn	All files in the partition will be owned by user id nnn. -  umask=nnn	The permission mask (see umask(1)) for the partition. - -Mount options that are the same as vfat partitions. These are only useful -when using discs encoded using Microsoft's Joliet extensions. -  iocharset=name Character set to use for converting from Unicode to -		ASCII.  Joliet filenames are stored in Unicode format, but -		Unix for the most part doesn't know how to deal with Unicode. -		There is also an option of doing UTF-8 translations with the -		utf8 option. -  utf8          Encode Unicode names in UTF-8 format. Default is no. - -Mount options unique to the isofs filesystem. -  block=512     Set the block size for the disk to 512 bytes -  block=1024    Set the block size for the disk to 1024 bytes -  block=2048    Set the block size for the disk to 2048 bytes -  check=relaxed Matches filenames with different cases -  check=strict  Matches only filenames with the exact same case -  cruft         Try to handle badly formatted CDs. -  map=off       Do not map non-Rock Ridge filenames to lower case -  map=normal    Map non-Rock Ridge filenames to lower case -  map=acorn     As map=normal but also apply Acorn extensions if present -  mode=xxx      Sets the permissions on files to xxx unless Rock Ridge -		extensions set the permissions otherwise -  dmode=xxx     Sets the permissions on directories to xxx unless Rock Ridge -		extensions set the permissions otherwise -  overriderockperm Set permissions on files and directories according to -		'mode' and 'dmode' even though Rock Ridge extensions are -		present. -  nojoliet      Ignore Joliet extensions if they are present. -  norock        Ignore Rock Ridge extensions if they are present. -  hide		Completely strip hidden files from the file system. -  showassoc	Show files marked with the 'associated' bit -  unhide	Deprecated; showing hidden files is now default; -		If given, it is a synonym for 'showassoc' which will -		recreate previous unhide behavior -  session=x     Select number of session on multisession CD -  sbsector=xxx  Session begins from sector xxx - -Recommended documents about ISO 9660 standard are located at: -http://www.y-adagio.com/ -ftp://ftp.ecma.ch/ecma-st/Ecma-119.pdf -Quoting from the PDF "This 2nd Edition of Standard ECMA-119 is technically  -identical with ISO 9660.", so it is a valid and gratis substitute of the -official ISO specification. diff --git a/Documentation/filesystems/nfs/index.rst b/Documentation/filesystems/nfs/index.rst new file mode 100644 index 000000000000..65805624e39b --- /dev/null +++ b/Documentation/filesystems/nfs/index.rst @@ -0,0 +1,13 @@ +=============================== +NFS +=============================== + + +.. toctree:: +   :maxdepth: 1 + +   pnfs +   rpc-cache +   rpc-server-gss +   nfs41-server +   knfsd-stats diff --git a/Documentation/filesystems/nfs/knfsd-stats.txt b/Documentation/filesystems/nfs/knfsd-stats.rst index 1a5d82180b84..80bcf13550de 100644 --- a/Documentation/filesystems/nfs/knfsd-stats.txt +++ b/Documentation/filesystems/nfs/knfsd-stats.rst @@ -1,7 +1,9 @@ - +============================  Kernel NFS Server Statistics  ============================ +:Authors: Greg Banks <[email protected]> - 26 Mar 2009 +  This document describes the format and semantics of the statistics  which the kernel NFS server makes available to userspace.  These  statistics are available in several text form pseudo files, each of @@ -18,7 +20,7 @@ by parsing routines.  All other lines contain a sequence of fields  separated by whitespace.  /proc/fs/nfsd/pool_stats ------------------------- +========================  This file is available in kernels from 2.6.30 onwards, if the  /proc/fs/nfsd filesystem is mounted (it almost always should be). @@ -109,15 +111,12 @@ this case), or the transport can be enqueued for later attention  (sockets-enqueued counts this case), or the packet can be temporarily  deferred because the transport is currently being used by an nfsd  thread.  This last case is not very interesting and is not explicitly -counted, but can be inferred from the other counters thus: +counted, but can be inferred from the other counters thus:: -packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken ) +	packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken )  More ----- -Descriptions of the other statistics file should go here. - +==== -Greg Banks <[email protected]> -26 Mar 2009 +Descriptions of the other statistics file should go here. diff --git a/Documentation/filesystems/nfs/nfs41-server.rst b/Documentation/filesystems/nfs/nfs41-server.rst new file mode 100644 index 000000000000..16b5f02f81c3 --- /dev/null +++ b/Documentation/filesystems/nfs/nfs41-server.rst @@ -0,0 +1,256 @@ +============================= +NFSv4.1 Server Implementation +============================= + +Server support for minorversion 1 can be controlled using the +/proc/fs/nfsd/versions control file.  The string output returned +by reading this file will contain either "+4.1" or "-4.1" +correspondingly. + +Currently, server support for minorversion 1 is enabled by default. +It can be disabled at run time by writing the string "-4.1" to +the /proc/fs/nfsd/versions control file.  Note that to write this +control file, the nfsd service must be taken down.  You can use rpc.nfsd +for this; see rpc.nfsd(8). + +(Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and +"-4", respectively.  Therefore, code meant to work on both new and old +kernels must turn 4.1 on or off *before* turning support for version 4 +on or off; rpc.nfsd does this correctly.) + +The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based +on RFC 5661. + +From the many new features in NFSv4.1 the current implementation +focuses on the mandatory-to-implement NFSv4.1 Sessions, providing +"exactly once" semantics and better control and throttling of the +resources allocated for each client. + +The table below, taken from the NFSv4.1 document, lists +the operations that are mandatory to implement (REQ), optional +(OPT), and NFSv4.0 operations that are required not to implement (MNI) +in minor version 1.  The first column indicates the operations that +are not supported yet by the linux server implementation. + +The OPTIONAL features identified and their abbreviations are as follows: + +- **pNFS**	Parallel NFS +- **FDELG**	File Delegations +- **DDELG**	Directory Delegations + +The following abbreviations indicate the linux server implementation status. + +- **I**	Implemented NFSv4.1 operations. +- **NS**	Not Supported. +- **NS\***	Unimplemented optional feature. + +Operations +========== + ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| Implementation status | Operation            | REQ,REC, OPT or NMI | Feature (REQ, REC or OPT) | Definition     | ++=======================+======================+=====================+===========================+================+ +|                       | ACCESS               | REQ                 |                           | Section 18.1   | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I                     | BACKCHANNEL_CTL      | REQ                 |                           | Section 18.33  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I                     | BIND_CONN_TO_SESSION | REQ                 |                           | Section 18.34  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | CLOSE                | REQ                 |                           | Section 18.2   | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | COMMIT               | REQ                 |                           | Section 18.3   | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | CREATE               | REQ                 |                           | Section 18.4   | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I                     | CREATE_SESSION       | REQ                 |                           | Section 18.36  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| NS*                   | DELEGPURGE           | OPT                 | FDELG (REQ)               | Section 18.5   | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | DELEGRETURN          | OPT                 | FDELG,                    | Section 18.6   | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       |                      |                     | DDELG, pNFS               |                | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       |                      |                     | (REQ)                     |                | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I                     | DESTROY_CLIENTID     | REQ                 |                           | Section 18.50  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I                     | DESTROY_SESSION      | REQ                 |                           | Section 18.37  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I                     | EXCHANGE_ID          | REQ                 |                           | Section 18.35  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I                     | FREE_STATEID         | REQ                 |                           | Section 18.38  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | GETATTR              | REQ                 |                           | Section 18.7   | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I                     | GETDEVICEINFO        | OPT                 | pNFS (REQ)                | Section 18.40  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| NS*                   | GETDEVICELIST        | OPT                 | pNFS (OPT)                | Section 18.41  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | GETFH                | REQ                 |                           | Section 18.8   | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| NS*                   | GET_DIR_DELEGATION   | OPT                 | DDELG (REQ)               | Section 18.39  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I                     | LAYOUTCOMMIT         | OPT                 | pNFS (REQ)                | Section 18.42  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I                     | LAYOUTGET            | OPT                 | pNFS (REQ)                | Section 18.43  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I                     | LAYOUTRETURN         | OPT                 | pNFS (REQ)                | Section 18.44  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | LINK                 | OPT                 |                           | Section 18.9   | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | LOCK                 | REQ                 |                           | Section 18.10  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | LOCKT                | REQ                 |                           | Section 18.11  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | LOCKU                | REQ                 |                           | Section 18.12  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | LOOKUP               | REQ                 |                           | Section 18.13  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | LOOKUPP              | REQ                 |                           | Section 18.14  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | NVERIFY              | REQ                 |                           | Section 18.15  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | OPEN                 | REQ                 |                           | Section 18.16  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| NS*                   | OPENATTR             | OPT                 |                           | Section 18.17  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | OPEN_CONFIRM         | MNI                 |                           | N/A            | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | OPEN_DOWNGRADE       | REQ                 |                           | Section 18.18  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | PUTFH                | REQ                 |                           | Section 18.19  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | PUTPUBFH             | REQ                 |                           | Section 18.20  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | PUTROOTFH            | REQ                 |                           | Section 18.21  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | READ                 | REQ                 |                           | Section 18.22  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | READDIR              | REQ                 |                           | Section 18.23  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | READLINK             | OPT                 |                           | Section 18.24  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | RECLAIM_COMPLETE     | REQ                 |                           | Section 18.51  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | RELEASE_LOCKOWNER    | MNI                 |                           | N/A            | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | REMOVE               | REQ                 |                           | Section 18.25  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | RENAME               | REQ                 |                           | Section 18.26  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | RENEW                | MNI                 |                           | N/A            | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | RESTOREFH            | REQ                 |                           | Section 18.27  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | SAVEFH               | REQ                 |                           | Section 18.28  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | SECINFO              | REQ                 |                           | Section 18.29  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I                     | SECINFO_NO_NAME      | REC                 | pNFS files                | Section 18.45, | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       |                      |                     | layout (REQ)              | Section 13.12  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I                     | SEQUENCE             | REQ                 |                           | Section 18.46  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | SETATTR              | REQ                 |                           | Section 18.30  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | SETCLIENTID          | MNI                 |                           | N/A            | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | SETCLIENTID_CONFIRM  | MNI                 |                           | N/A            | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| NS                    | SET_SSV              | REQ                 |                           | Section 18.47  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| I                     | TEST_STATEID         | REQ                 |                           | Section 18.48  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | VERIFY               | REQ                 |                           | Section 18.31  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +| NS*                   | WANT_DELEGATION      | OPT                 | FDELG (OPT)               | Section 18.49  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ +|                       | WRITE                | REQ                 |                           | Section 18.32  | ++-----------------------+----------------------+---------------------+---------------------------+----------------+ + + +Callback Operations +=================== ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| Implementation status | Operation               | REQ,REC, OPT or NMI | Feature (REQ, REC or OPT) | Definition    | ++=======================+=========================+=====================+===========================+===============+ +|                       | CB_GETATTR              | OPT                 | FDELG (REQ)               | Section 20.1  | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| I                     | CB_LAYOUTRECALL         | OPT                 | pNFS (REQ)                | Section 20.3  | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| NS*                   | CB_NOTIFY               | OPT                 | DDELG (REQ)               | Section 20.4  | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| NS*                   | CB_NOTIFY_DEVICEID      | OPT                 | pNFS (OPT)                | Section 20.12 | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| NS*                   | CB_NOTIFY_LOCK          | OPT                 |                           | Section 20.11 | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| NS*                   | CB_PUSH_DELEG           | OPT                 | FDELG (OPT)               | Section 20.5  | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +|                       | CB_RECALL               | OPT                 | FDELG,                    | Section 20.2  | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +|                       |                         |                     | DDELG, pNFS               |               | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +|                       |                         |                     | (REQ)                     |               | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| NS*                   | CB_RECALL_ANY           | OPT                 | FDELG,                    | Section 20.6  | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +|                       |                         |                     | DDELG, pNFS               |               | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +|                       |                         |                     | (REQ)                     |               | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| NS                    | CB_RECALL_SLOT          | REQ                 |                           | Section 20.8  | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| NS*                   | CB_RECALLABLE_OBJ_AVAIL | OPT                 | DDELG, pNFS               | Section 20.7  | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +|                       |                         |                     | (REQ)                     |               | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| I                     | CB_SEQUENCE             | OPT                 | FDELG,                    | Section 20.9  | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +|                       |                         |                     | DDELG, pNFS               |               | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +|                       |                         |                     | (REQ)                     |               | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +| NS*                   | CB_WANTS_CANCELLED      | OPT                 | FDELG,                    | Section 20.10 | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +|                       |                         |                     | DDELG, pNFS               |               | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ +|                       |                         |                     | (REQ)                     |               | ++-----------------------+-------------------------+---------------------+---------------------------+---------------+ + + +Implementation notes: +===================== + +SSV: +  The spec claims this is mandatory, but we don't actually know of any +  implementations, so we're ignoring it for now.  The server returns +  NFS4ERR_ENCR_ALG_UNSUPP on EXCHANGE_ID, which should be future-proof. + +GSS on the backchannel: +  Again, theoretically required but not widely implemented (in +  particular, the current Linux client doesn't request it).  We return +  NFS4ERR_ENCR_ALG_UNSUPP on CREATE_SESSION. + +DELEGPURGE: +  mandatory only for servers that support CLAIM_DELEGATE_PREV and/or +  CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that +  persist across client reboots).  Thus we need not implement this for +  now. + +EXCHANGE_ID: +  implementation ids are ignored + +CREATE_SESSION: +  backchannel attributes are ignored + +SEQUENCE: +  no support for dynamic slot table renegotiation (optional) + +Nonstandard compound limitations: +  No support for a sessions fore channel RPC compound that requires both a +  ca_maxrequestsize request and a ca_maxresponsesize reply, so we may +  fail to live up to the promise we made in CREATE_SESSION fore channel +  negotiation. + +See also http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues. diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt deleted file mode 100644 index 682a59fabe3f..000000000000 --- a/Documentation/filesystems/nfs/nfs41-server.txt +++ /dev/null @@ -1,173 +0,0 @@ -NFSv4.1 Server Implementation - -Server support for minorversion 1 can be controlled using the -/proc/fs/nfsd/versions control file.  The string output returned -by reading this file will contain either "+4.1" or "-4.1" -correspondingly. - -Currently, server support for minorversion 1 is enabled by default. -It can be disabled at run time by writing the string "-4.1" to -the /proc/fs/nfsd/versions control file.  Note that to write this -control file, the nfsd service must be taken down.  You can use rpc.nfsd -for this; see rpc.nfsd(8). - -(Warning: older servers will interpret "+4.1" and "-4.1" as "+4" and -"-4", respectively.  Therefore, code meant to work on both new and old -kernels must turn 4.1 on or off *before* turning support for version 4 -on or off; rpc.nfsd does this correctly.) - -The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based -on RFC 5661. - -From the many new features in NFSv4.1 the current implementation -focuses on the mandatory-to-implement NFSv4.1 Sessions, providing -"exactly once" semantics and better control and throttling of the -resources allocated for each client. - -The table below, taken from the NFSv4.1 document, lists -the operations that are mandatory to implement (REQ), optional -(OPT), and NFSv4.0 operations that are required not to implement (MNI) -in minor version 1.  The first column indicates the operations that -are not supported yet by the linux server implementation. - -The OPTIONAL features identified and their abbreviations are as follows: -	pNFS	Parallel NFS -	FDELG	File Delegations -	DDELG	Directory Delegations - -The following abbreviations indicate the linux server implementation status. -	I	Implemented NFSv4.1 operations. -	NS	Not Supported. -	NS*	Unimplemented optional feature. - -Operations - -   +----------------------+------------+--------------+----------------+ -   | Operation            | REQ, REC,  | Feature      | Definition     | -   |                      | OPT, or    | (REQ, REC,   |                | -   |                      | MNI        | or OPT)      |                | -   +----------------------+------------+--------------+----------------+ -   | ACCESS               | REQ        |              | Section 18.1   | -I  | BACKCHANNEL_CTL      | REQ        |              | Section 18.33  | -I  | BIND_CONN_TO_SESSION | REQ        |              | Section 18.34  | -   | CLOSE                | REQ        |              | Section 18.2   | -   | COMMIT               | REQ        |              | Section 18.3   | -   | CREATE               | REQ        |              | Section 18.4   | -I  | CREATE_SESSION       | REQ        |              | Section 18.36  | -NS*| DELEGPURGE           | OPT        | FDELG (REQ)  | Section 18.5   | -   | DELEGRETURN          | OPT        | FDELG,       | Section 18.6   | -   |                      |            | DDELG, pNFS  |                | -   |                      |            | (REQ)        |                | -I  | DESTROY_CLIENTID     | REQ        |              | Section 18.50  | -I  | DESTROY_SESSION      | REQ        |              | Section 18.37  | -I  | EXCHANGE_ID          | REQ        |              | Section 18.35  | -I  | FREE_STATEID         | REQ        |              | Section 18.38  | -   | GETATTR              | REQ        |              | Section 18.7   | -I  | GETDEVICEINFO        | OPT        | pNFS (REQ)   | Section 18.40  | -NS*| GETDEVICELIST        | OPT        | pNFS (OPT)   | Section 18.41  | -   | GETFH                | REQ        |              | Section 18.8   | -NS*| GET_DIR_DELEGATION   | OPT        | DDELG (REQ)  | Section 18.39  | -I  | LAYOUTCOMMIT         | OPT        | pNFS (REQ)   | Section 18.42  | -I  | LAYOUTGET            | OPT        | pNFS (REQ)   | Section 18.43  | -I  | LAYOUTRETURN         | OPT        | pNFS (REQ)   | Section 18.44  | -   | LINK                 | OPT        |              | Section 18.9   | -   | LOCK                 | REQ        |              | Section 18.10  | -   | LOCKT                | REQ        |              | Section 18.11  | -   | LOCKU                | REQ        |              | Section 18.12  | -   | LOOKUP               | REQ        |              | Section 18.13  | -   | LOOKUPP              | REQ        |              | Section 18.14  | -   | NVERIFY              | REQ        |              | Section 18.15  | -   | OPEN                 | REQ        |              | Section 18.16  | -NS*| OPENATTR             | OPT        |              | Section 18.17  | -   | OPEN_CONFIRM         | MNI        |              | N/A            | -   | OPEN_DOWNGRADE       | REQ        |              | Section 18.18  | -   | PUTFH                | REQ        |              | Section 18.19  | -   | PUTPUBFH             | REQ        |              | Section 18.20  | -   | PUTROOTFH            | REQ        |              | Section 18.21  | -   | READ                 | REQ        |              | Section 18.22  | -   | READDIR              | REQ        |              | Section 18.23  | -   | READLINK             | OPT        |              | Section 18.24  | -   | RECLAIM_COMPLETE     | REQ        |              | Section 18.51  | -   | RELEASE_LOCKOWNER    | MNI        |              | N/A            | -   | REMOVE               | REQ        |              | Section 18.25  | -   | RENAME               | REQ        |              | Section 18.26  | -   | RENEW                | MNI        |              | N/A            | -   | RESTOREFH            | REQ        |              | Section 18.27  | -   | SAVEFH               | REQ        |              | Section 18.28  | -   | SECINFO              | REQ        |              | Section 18.29  | -I  | SECINFO_NO_NAME      | REC        | pNFS files   | Section 18.45, | -   |                      |            | layout (REQ) | Section 13.12  | -I  | SEQUENCE             | REQ        |              | Section 18.46  | -   | SETATTR              | REQ        |              | Section 18.30  | -   | SETCLIENTID          | MNI        |              | N/A            | -   | SETCLIENTID_CONFIRM  | MNI        |              | N/A            | -NS | SET_SSV              | REQ        |              | Section 18.47  | -I  | TEST_STATEID         | REQ        |              | Section 18.48  | -   | VERIFY               | REQ        |              | Section 18.31  | -NS*| WANT_DELEGATION      | OPT        | FDELG (OPT)  | Section 18.49  | -   | WRITE                | REQ        |              | Section 18.32  | - -Callback Operations - -   +-------------------------+-----------+-------------+---------------+ -   | Operation               | REQ, REC, | Feature     | Definition    | -   |                         | OPT, or   | (REQ, REC,  |               | -   |                         | MNI       | or OPT)     |               | -   +-------------------------+-----------+-------------+---------------+ -   | CB_GETATTR              | OPT       | FDELG (REQ) | Section 20.1  | -I  | CB_LAYOUTRECALL         | OPT       | pNFS (REQ)  | Section 20.3  | -NS*| CB_NOTIFY               | OPT       | DDELG (REQ) | Section 20.4  | -NS*| CB_NOTIFY_DEVICEID      | OPT       | pNFS (OPT)  | Section 20.12 | -NS*| CB_NOTIFY_LOCK          | OPT       |             | Section 20.11 | -NS*| CB_PUSH_DELEG           | OPT       | FDELG (OPT) | Section 20.5  | -   | CB_RECALL               | OPT       | FDELG,      | Section 20.2  | -   |                         |           | DDELG, pNFS |               | -   |                         |           | (REQ)       |               | -NS*| CB_RECALL_ANY           | OPT       | FDELG,      | Section 20.6  | -   |                         |           | DDELG, pNFS |               | -   |                         |           | (REQ)       |               | -NS | CB_RECALL_SLOT          | REQ       |             | Section 20.8  | -NS*| CB_RECALLABLE_OBJ_AVAIL | OPT       | DDELG, pNFS | Section 20.7  | -   |                         |           | (REQ)       |               | -I  | CB_SEQUENCE             | OPT       | FDELG,      | Section 20.9  | -   |                         |           | DDELG, pNFS |               | -   |                         |           | (REQ)       |               | -NS*| CB_WANTS_CANCELLED      | OPT       | FDELG,      | Section 20.10 | -   |                         |           | DDELG, pNFS |               | -   |                         |           | (REQ)       |               | -   +-------------------------+-----------+-------------+---------------+ - -Implementation notes: - -SSV: -* The spec claims this is mandatory, but we don't actually know of any -  implementations, so we're ignoring it for now.  The server returns -  NFS4ERR_ENCR_ALG_UNSUPP on EXCHANGE_ID, which should be future-proof. - -GSS on the backchannel: -* Again, theoretically required but not widely implemented (in -  particular, the current Linux client doesn't request it).  We return -  NFS4ERR_ENCR_ALG_UNSUPP on CREATE_SESSION. - -DELEGPURGE: -* mandatory only for servers that support CLAIM_DELEGATE_PREV and/or -  CLAIM_DELEG_PREV_FH (which allows clients to keep delegations that -  persist across client reboots).  Thus we need not implement this for -  now. - -EXCHANGE_ID: -* implementation ids are ignored - -CREATE_SESSION: -* backchannel attributes are ignored - -SEQUENCE: -* no support for dynamic slot table renegotiation (optional) - -Nonstandard compound limitations: -* No support for a sessions fore channel RPC compound that requires both a -  ca_maxrequestsize request and a ca_maxresponsesize reply, so we may -  fail to live up to the promise we made in CREATE_SESSION fore channel -  negotiation. - -See also http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues. diff --git a/Documentation/filesystems/nfs/pnfs.txt b/Documentation/filesystems/nfs/pnfs.rst index 80dc0bdc302a..7c470ecdc3a9 100644 --- a/Documentation/filesystems/nfs/pnfs.txt +++ b/Documentation/filesystems/nfs/pnfs.rst @@ -1,15 +1,17 @@ -Reference counting in pnfs: +========================== +Reference counting in pnfs  ==========================  The are several inter-related caches.  We have layouts which can  reference multiple devices, each of which can reference multiple data servers.  Each data server can be referenced by multiple devices.  Each device -can be referenced by multiple layouts.  To keep all of this straight, +can be referenced by multiple layouts. To keep all of this straight,  we need to reference count.  struct pnfs_layout_hdr ----------------------- +====================== +  The on-the-wire command LAYOUTGET corresponds to struct  pnfs_layout_segment, usually referred to by the variable name lseg.  Each nfs_inode may hold a pointer to a cache of these layout @@ -25,7 +27,8 @@ the reference count, as the layout is kept around by the lseg that  keeps it in the list.  deviceid_cache --------------- +============== +  lsegs reference device ids, which are resolved per nfs_client and  layout driver type.  The device ids are held in a RCU cache (struct  nfs4_deviceid_cache).  The cache itself is referenced across each @@ -38,24 +41,26 @@ justification, but seems reasonable given that we can have multiple  deviceid's per filesystem, and multiple filesystems per nfs_client.  The hash code is copied from the nfsd code base.  A discussion of -hashing and variations of this algorithm can be found at: -http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809 +hashing and variations of this algorithm can be found `here. +<http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809>`_  data server cache ------------------ +================= +  file driver devices refer to data servers, which are kept in a module  level cache.  Its reference is held over the lifetime of the deviceid  pointing to it.  lseg ----- +==== +  lseg maintains an extra reference corresponding to the NFS_LSEG_VALID  bit which holds it in the pnfs_layout_hdr's list.  When the final lseg  is removed from the pnfs_layout_hdr's list, the NFS_LAYOUT_DESTROYED  bit is set, preventing any new lsegs from being added.  layout drivers --------------- +==============  PNFS utilizes what is called layout drivers. The STD defines 4 basic  layout types: "files", "objects", "blocks", and "flexfiles". For each @@ -68,6 +73,6 @@ Blocks-layout-driver code is in: fs/nfs/blocklayout/.. directory  Flexfiles-layout-driver code is in: fs/nfs/flexfilelayout/.. directory  blocks-layout setup -------------------- +===================  TODO: Document the setup needs of the blocks layout driver diff --git a/Documentation/filesystems/nfs/rpc-cache.txt b/Documentation/filesystems/nfs/rpc-cache.rst index c4dac829db0f..bb164eea969b 100644 --- a/Documentation/filesystems/nfs/rpc-cache.txt +++ b/Documentation/filesystems/nfs/rpc-cache.rst @@ -1,9 +1,14 @@ -	This document gives a brief introduction to the caching +========= +RPC Cache +========= + +This document gives a brief introduction to the caching  mechanisms in the sunrpc layer that is used, in particular,  for NFS authentication. -CACHES +Caches  ====== +  The caching replaces the old exports table and allows for  a wide variety of values to be caches. @@ -12,6 +17,7 @@ quite possibly very different in content and use.  There is a corpus  of common code for managing these caches.  Examples of caches that are likely to be needed are: +    - mapping from IP address to client name    - mapping from client name and filesystem to export options    - mapping from UID to list of GIDs, to work around NFS's limitation @@ -21,6 +27,7 @@ Examples of caches that are likely to be needed are:    - mapping from network identify to public key for crypto authentication.  The common code handles such things as: +     - general cache lookup with correct locking     - supporting 'NEGATIVE' as well as positive entries     - allowing an EXPIRED time on cache items, and removing @@ -35,60 +42,66 @@ The common code handles such things as:  Creating a Cache  ---------------- -1/ A cache needs a datum to store.  This is in the form of a -   structure definition that must contain a -     struct cache_head +-  A cache needs a datum to store.  This is in the form of a +   structure definition that must contain a struct cache_head     as an element, usually the first.     It will also contain a key and some content.     Each cache element is reference counted and contains     expiry and update times for use in cache management. -2/ A cache needs a "cache_detail" structure that +-  A cache needs a "cache_detail" structure that     describes the cache.  This stores the hash table, some     parameters for cache management, and some operations detailing how     to work with particular cache items. -   The operations requires are: -   	struct cache_head *alloc(void) -		This simply allocates appropriate memory and returns -   		a pointer to the cache_detail embedded within the -		structure -	void cache_put(struct kref *) -		This is called when the last reference to an item is -		dropped.  The pointer passed is to the 'ref' field -		in the cache_head.  cache_put should release any -		references create by 'cache_init' and, if CACHE_VALID -		is set, any references created by cache_update. -		It should then release the memory allocated by -   		'alloc'. -        int match(struct cache_head *orig, struct cache_head *new) -		test if the keys in the two structures match.  Return -		1 if they do, 0 if they don't. -	void init(struct cache_head *orig, struct cache_head *new) -		Set the 'key' fields in 'new' from 'orig'.  This may -		include taking references to shared objects. -	void update(struct cache_head *orig, struct cache_head *new) -		Set the 'content' fileds in 'new' from 'orig'. -	int cache_show(struct seq_file *m, struct cache_detail *cd, -			struct cache_head *h) -		Optional.  Used to provide a /proc file that lists the -		contents of a cache.  This should show one item, -   		usually on just one line. -	int cache_request(struct cache_detail *cd, struct cache_head *h, -   		char **bpp, int *blen) -		Format a request to be send to user-space for an item -   		to be instantiated.  *bpp is a buffer of size *blen. -		bpp should be moved forward over the encoded message, -		and  *blen should be reduced to show how much free -		space remains.  Return 0 on success or <0 if not -		enough room or other problem. -	int cache_parse(struct cache_detail *cd, char *buf, int len) -		A message from user space has arrived to fill out a -		cache entry.  It is in 'buf' of length 'len'. -		cache_parse should parse this, find the item in the -		cache with sunrpc_cache_lookup_rcu, and update the item -		with sunrpc_cache_update. - - -3/ A cache needs to be registered using cache_register().  This + +   The operations are: + +    struct cache_head \*alloc(void) +      This simply allocates appropriate memory and returns +      a pointer to the cache_detail embedded within the +      structure + +    void cache_put(struct kref \*) +      This is called when the last reference to an item is +      dropped.  The pointer passed is to the 'ref' field +      in the cache_head.  cache_put should release any +      references create by 'cache_init' and, if CACHE_VALID +      is set, any references created by cache_update. +      It should then release the memory allocated by +      'alloc'. + +    int match(struct cache_head \*orig, struct cache_head \*new) +      test if the keys in the two structures match.  Return +      1 if they do, 0 if they don't. + +    void init(struct cache_head \*orig, struct cache_head \*new) +      Set the 'key' fields in 'new' from 'orig'.  This may +      include taking references to shared objects. + +    void update(struct cache_head \*orig, struct cache_head \*new) +      Set the 'content' fileds in 'new' from 'orig'. + +    int cache_show(struct seq_file \*m, struct cache_detail \*cd, struct cache_head \*h) +      Optional.  Used to provide a /proc file that lists the +      contents of a cache.  This should show one item, +      usually on just one line. + +    int cache_request(struct cache_detail \*cd, struct cache_head \*h, char \*\*bpp, int \*blen) +      Format a request to be send to user-space for an item +      to be instantiated.  \*bpp is a buffer of size \*blen. +      bpp should be moved forward over the encoded message, +      and  \*blen should be reduced to show how much free +      space remains.  Return 0 on success or <0 if not +      enough room or other problem. + +    int cache_parse(struct cache_detail \*cd, char \*buf, int len) +      A message from user space has arrived to fill out a +      cache entry.  It is in 'buf' of length 'len'. +      cache_parse should parse this, find the item in the +      cache with sunrpc_cache_lookup_rcu, and update the item +      with sunrpc_cache_update. + + +-  A cache needs to be registered using cache_register().  This     includes it on a list of caches that will be regularly     cleaned to discard old data. @@ -107,7 +120,7 @@ cache_check will return -ENOENT in the entry is negative or if an up  call is needed but not possible, -EAGAIN if an upcall is pending,  or 0 if the data is valid; -cache_check can be passed a "struct cache_req *".  This structure is +cache_check can be passed a "struct cache_req\*".  This structure is  typically embedded in the actual request and can be used to create a  deferred copy of the request (struct cache_deferred_req).  This is  done when the found cache item is not uptodate, but the is reason to @@ -139,9 +152,11 @@ The 'channel' works a bit like a datagram socket. Each 'write' is  passed as a whole to the cache for parsing and interpretation.  Each cache can treat the write requests differently, but it is  expected that a message written will contain: +    - a key    - an expiry time    - a content. +  with the intention that an item in the cache with the give key  should be create or updated to have the given content, and the  expiry time should be set on that item. @@ -156,7 +171,8 @@ If there are no more requests to return, read will return EOF, but a  select or poll for read will block waiting for another request to be  added. -Thus a user-space helper is likely to: +Thus a user-space helper is likely to:: +    open the channel.      select for readable      read a request @@ -175,12 +191,13 @@ Each cache should also define a "cache_request" method which  takes a cache item and encodes a request into the buffer  provided. -Note: If a cache has no active readers on the channel, and has had not -active readers for more than 60 seconds, further requests will not be -added to the channel but instead all lookups that do not find a valid -entry will fail.  This is partly for backward compatibility: The -previous nfs exports table was deemed to be authoritative and a -failed lookup meant a definite 'no'. +.. note:: +  If a cache has no active readers on the channel, and has had not +  active readers for more than 60 seconds, further requests will not be +  added to the channel but instead all lookups that do not find a valid +  entry will fail.  This is partly for backward compatibility: The +  previous nfs exports table was deemed to be authoritative and a +  failed lookup meant a definite 'no'.  request/response format  ----------------------- @@ -193,10 +210,11 @@ with precisely one newline character which should be at the end.  Fields within the record should be separated by spaces, normally one.  If spaces, newlines, or nul characters are needed in a field they  much be quoted.  two mechanisms are available: -1/ If a field begins '\x' then it must contain an even number of + +-  If a field begins '\x' then it must contain an even number of     hex digits, and pairs of these digits provide the bytes in the     field. -2/ otherwise a \ in the field must be followed by 3 octal digits +-  otherwise a \ in the field must be followed by 3 octal digits     which give the code for a byte.  Other characters are treated     as them selves.  At the very least, space, newline, nul, and     '\' must be quoted in this way. diff --git a/Documentation/filesystems/nfs/rpc-server-gss.txt b/Documentation/filesystems/nfs/rpc-server-gss.rst index 310bbbaf9080..812754576845 100644 --- a/Documentation/filesystems/nfs/rpc-server-gss.txt +++ b/Documentation/filesystems/nfs/rpc-server-gss.rst @@ -1,4 +1,4 @@ - +=========================================  rpcsec_gss support for kernel RPC servers  ========================================= @@ -9,14 +9,17 @@ NFSv4.1 and higher don't require the client to act as a server for the  purposes of authentication.)  RPCGSS is specified in a few IETF documents: +   - RFC2203 v1: http://tools.ietf.org/rfc/rfc2203.txt   - RFC5403 v2: http://tools.ietf.org/rfc/rfc5403.txt +  and there is a 3rd version  being proposed: +   - http://tools.ietf.org/id/draft-williams-rpcsecgssv3.txt     (At draft n. 02 at the time of writing)  Background ----------- +==========  The RPCGSS Authentication method describes a way to perform GSSAPI  Authentication for NFS.  Although GSSAPI is itself completely mechanism @@ -29,6 +32,7 @@ depends on GSSAPI extensions that are KRB5 specific.  GSSAPI is a complex library, and implementing it completely in kernel is  unwarranted. However GSSAPI operations are fundementally separable in 2  parts: +  - initial context establishment  - integrity/privacy protection (signing and encrypting of individual    packets) @@ -41,7 +45,7 @@ kernel, but leave the initial context establishment to userspace.  We  need upcalls to request userspace to perform context establishment.  NFS Server Legacy Upcall Mechanism ----------------------------------- +==================================  The classic upcall mechanism uses a custom text based upcall mechanism  to talk to a custom daemon called rpc.svcgssd that is provide by the @@ -62,21 +66,20 @@ groups) due to limitation on the size of the buffer that can be send  back to the kernel (4KiB).  NFS Server New RPC Upcall Mechanism ------------------------------------ +===================================  The newer upcall mechanism uses RPC over a unix socket to a daemon  called gss-proxy, implemented by a userspace program called Gssproxy. -The gss_proxy RPC protocol is currently documented here: - -	https://fedorahosted.org/gss-proxy/wiki/ProtocolDocumentation +The gss_proxy RPC protocol is currently documented `here +<https://fedorahosted.org/gss-proxy/wiki/ProtocolDocumentation>`_.  This upcall mechanism uses the kernel rpc client and connects to the gssproxy  userspace program over a regular unix socket. The gssproxy protocol does not  suffer from the size limitations of the legacy protocol.  Negotiating Upcall Mechanisms ------------------------------ +=============================  To provide backward compatibility, the kernel defaults to using the  legacy mechanism.  To switch to the new mechanism, gss-proxy must bind diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.rst index f2f3f8592a6f..6c49f04e9e0a 100644 --- a/Documentation/filesystems/nilfs2.txt +++ b/Documentation/filesystems/nilfs2.rst @@ -1,5 +1,8 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======  NILFS2 ------- +======  NILFS2 is a log-structured file system (LFS) supporting continuous  snapshotting.  In addition to versioning capability of the entire file @@ -25,9 +28,9 @@ available from the following download page.  At least "mkfs.nilfs2",  cleaner or garbage collector) are required.  Details on the tools are  described in the man pages included in the package. -Project web page:    https://nilfs.sourceforge.io/ -Download page:       https://nilfs.sourceforge.io/en/download.html -List info:           http://vger.kernel.org/vger-lists.html#linux-nilfs +:Project web page:    https://nilfs.sourceforge.io/ +:Download page:       https://nilfs.sourceforge.io/en/download.html +:List info:           http://vger.kernel.org/vger-lists.html#linux-nilfs  Caveats  ======= @@ -47,6 +50,7 @@ Mount options  NILFS2 supports the following mount options:  (*) == default +======================= =======================================================  barrier(*)		This enables/disables the use of write barriers.  This  nobarrier		requires an IO stack which can support barriers, and  			if nilfs gets an error on a barrier write, it will @@ -79,6 +83,7 @@ discard			This enables/disables the use of discard/TRIM commands.  nodiscard(*)		The discard/TRIM commands are sent to the underlying  			block device when blocks are freed.  This is useful  			for SSD devices and sparse/thinly-provisioned LUNs. +======================= =======================================================  Ioctls  ====== @@ -87,9 +92,11 @@ There is some NILFS2 specific functionality which can be accessed by application  through the system call interfaces. The list of all NILFS2 specific ioctls are  shown in the table below. -Table of NILFS2 specific ioctls -.............................................................................. +Table of NILFS2 specific ioctls: + + ============================== ===============================================   Ioctl			        Description + ============================== ===============================================   NILFS_IOCTL_CHANGE_CPMODE      Change mode of given checkpoint between  			        checkpoint and snapshot state. This ioctl is  			        used in chcp and mkcp utilities. @@ -142,11 +149,12 @@ Table of NILFS2 specific ioctls   NILFS_IOCTL_SET_ALLOC_RANGE    Define lower limit of segments in bytes and  			        upper limit of segments in bytes. This ioctl  			        is used by nilfs_resize utility. + ============================== ===============================================  NILFS2 usage  ============ -To use nilfs2 as a local file system, simply: +To use nilfs2 as a local file system, simply::   # mkfs -t nilfs2 /dev/block_device   # mount -t nilfs2 /dev/block_device /dir @@ -157,18 +165,20 @@ This will also invoke the cleaner through the mount helper program  Checkpoints and snapshots are managed by the following commands.  Their manpages are included in the nilfs-utils package above. +  ====     ===========================================================    lscp     list checkpoints or snapshots.    mkcp     make a checkpoint or a snapshot.    chcp     change an existing checkpoint to a snapshot or vice versa.    rmcp     invalidate specified checkpoint(s). +  ====     =========================================================== -To mount a snapshot, +To mount a snapshot::   # mount -t nilfs2 -r -o cp=<cno> /dev/block_device /snap_dir  where <cno> is the checkpoint number of the snapshot. -To unmount the NILFS2 mount point or snapshot, simply: +To unmount the NILFS2 mount point or snapshot, simply::   # umount /dir @@ -181,7 +191,7 @@ Disk format  A nilfs2 volume is equally divided into a number of segments except  for the super block (SB) and segment #0.  A segment is the container  of logs.  Each log is composed of summary information blocks, payload -blocks, and an optional super root block (SR): +blocks, and an optional super root block (SR)::     ______________________________________________________    | |SB| | Segment | Segment | Segment | ... | Segment | | @@ -200,7 +210,7 @@ blocks, and an optional super root block (SR):    |_blocks__|_________________|__|  The payload blocks are organized per file, and each file consists of -data blocks and B-tree node blocks: +data blocks and B-tree node blocks::      |<---       File-A        --->|<---       File-B        --->|     _______________________________________________________________ @@ -213,7 +223,7 @@ files without data blocks or B-tree node blocks.  The organization of the blocks is recorded in the summary information  blocks, which contains a header structure (nilfs_segment_summary), per -file structures (nilfs_finfo), and per block structures (nilfs_binfo): +file structures (nilfs_finfo), and per block structures (nilfs_binfo)::    _________________________________________________________________________   | Summary | finfo | binfo | ... | binfo | finfo | binfo | ... | binfo |... @@ -223,7 +233,7 @@ file structures (nilfs_finfo), and per block structures (nilfs_binfo):  The logs include regular files, directory files, symbolic link files  and several meta data files.  The mata data files are the files used  to maintain file system meta data.  The current version of NILFS2 uses -the following meta data files: +the following meta data files::   1) Inode file (ifile)             -- Stores on-disk inodes   2) Checkpoint file (cpfile)       -- Stores checkpoints @@ -232,7 +242,7 @@ the following meta data files:      (DAT)                             block numbers.  This file serves to                                        make on-disk blocks relocatable. -The following figure shows a typical organization of the logs: +The following figure shows a typical organization of the logs::    _________________________________________________________________________   | Summary | regular file | file  | ... | ifile | cpfile | sufile | DAT |SR| @@ -250,7 +260,7 @@ three special inodes, inodes for the DAT, cpfile, and sufile.  Inodes  of regular files, directories, symlinks and other special files, are  included in the ifile.  The inode of ifile itself is included in the  corresponding checkpoint entry in the cpfile.  Thus, the hierarchy -among NILFS2 files can be depicted as follows: +among NILFS2 files can be depicted as follows::    Super block (SB)         | diff --git a/Documentation/filesystems/ntfs.txt b/Documentation/filesystems/ntfs.rst index 553f10d03076..5bb093a26485 100644 --- a/Documentation/filesystems/ntfs.txt +++ b/Documentation/filesystems/ntfs.rst @@ -1,19 +1,21 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================================  The Linux NTFS filesystem driver  ================================ -Table of contents -================= +.. Table of contents -- Overview -- Web site -- Features -- Supported mount options -- Known bugs and (mis-)features -- Using NTFS volume and stripe sets -  - The Device-Mapper driver -  - The Software RAID / MD driver -  - Limitations when using the MD driver +   - Overview +   - Web site +   - Features +   - Supported mount options +   - Known bugs and (mis-)features +   - Using NTFS volume and stripe sets +     - The Device-Mapper driver +     - The Software RAID / MD driver +     - Limitations when using the MD driver  Overview @@ -66,8 +68,10 @@ Features    partition by creating a large file while in Windows and then loopback    mounting the file while in Linux and creating a Linux filesystem on it that    is used to install Linux on it. -- A comparison of the two drivers using: +- A comparison of the two drivers using:: +  	time find . -type f -exec md5sum "{}" \; +    run three times in sequence with each driver (after a reboot) on a 1.4GiB    NTFS partition, showed the new driver to be 20% faster in total time elapsed    (from 9:43 minutes on average down to 7:53).  The time spent in user space @@ -104,6 +108,7 @@ In addition to the generic mount options described by the manual page for the  mount command (man 8 mount, also see man 5 fstab), the NTFS driver supports the  following mount options: +======================= =======================================================  iocharset=name		Deprecated option.  Still supported but please use  			nls=name in the future.  See description for nls=name. @@ -175,16 +180,22 @@ disable_sparse=<BOOL>	If disable_sparse is specified, creation of sparse  errors=opt		What to do when critical filesystem errors are found.  			Following values can be used for "opt": -			  continue: DEFAULT, try to clean-up as much as + +			  ========  ========================================= +			  continue  DEFAULT, try to clean-up as much as  				    possible, e.g. marking a corrupt inode as  				    bad so it is no longer accessed, and then  				    continue. -			  recover:  At present only supported is recovery of +			  recover   At present only supported is recovery of  				    the boot sector from the backup copy.  				    If read-only mount, the recovery is done  				    in memory only and not written to disk. -			Note that the options are additive, i.e. specifying: +			  ========  ========================================= + +			Note that the options are additive, i.e. specifying:: +  			   errors=continue,errors=recover +  			means the driver will attempt to recover and if that  			fails it will clean-up as much as possible and  			continue. @@ -202,12 +213,18 @@ mft_zone_multiplier=	Set the MFT zone multiplier for the volume (this  			In general use the default.  If you have a lot of small  			files then use a higher value.  The values have the  			following meaning: + +			      =====	    =================================  			      Value	     MFT zone size (% of volume size) +			      =====	    =================================  				1		12.5%  				2		25%  				3		37.5%  				4		50% +			      =====	    ================================= +  			Note this option is irrelevant for read-only mounts. +======================= =======================================================  Known bugs and (mis-)features @@ -252,18 +269,18 @@ To create the table describing your volume you will need to know each of its  components and their sizes in sectors, i.e. multiples of 512-byte blocks.  For NT4 fault tolerant volumes you can obtain the sizes using fdisk.  So for -example if one of your partitions is /dev/hda2 you would do: +example if one of your partitions is /dev/hda2 you would do:: -$ fdisk -ul /dev/hda +    $ fdisk -ul /dev/hda -Disk /dev/hda: 81.9 GB, 81964302336 bytes -255 heads, 63 sectors/track, 9964 cylinders, total 160086528 sectors -Units = sectors of 1 * 512 = 512 bytes +    Disk /dev/hda: 81.9 GB, 81964302336 bytes +    255 heads, 63 sectors/track, 9964 cylinders, total 160086528 sectors +    Units = sectors of 1 * 512 = 512 bytes -   Device Boot      Start         End      Blocks   Id  System -   /dev/hda1   *          63     4209029     2104483+  83  Linux -   /dev/hda2         4209030    37768814    16779892+  86  NTFS -   /dev/hda3        37768815    46170809     4200997+  83  Linux +	Device Boot      Start         End      Blocks   Id  System +	/dev/hda1   *          63     4209029     2104483+  83  Linux +	/dev/hda2         4209030    37768814    16779892+  86  NTFS +	/dev/hda3        37768815    46170809     4200997+  83  Linux  And you would know that /dev/hda2 has a size of 37768814 - 4209030 + 1 =  33559785 sectors. @@ -271,15 +288,17 @@ And you would know that /dev/hda2 has a size of 37768814 - 4209030 + 1 =  For Win2k and later dynamic disks, you can for example use the ldminfo utility  which is part of the Linux LDM tools (the latest version at the time of  writing is linux-ldm-0.0.8.tar.bz2).  You can download it from: +  	http://www.linux-ntfs.org/ +  Simply extract the downloaded archive (tar xvjf linux-ldm-0.0.8.tar.bz2), go  into it (cd linux-ldm-0.0.8) and change to the test directory (cd test).  You  will find the precompiled (i386) ldminfo utility there.  NOTE: You will not be  able to compile this yourself easily so use the binary version! -Then you would use ldminfo in dump mode to obtain the necessary information: +Then you would use ldminfo in dump mode to obtain the necessary information:: -$ ./ldminfo --dump /dev/hda +    $ ./ldminfo --dump /dev/hda  This would dump the LDM database found on /dev/hda which describes all of your  dynamic disks and all the volumes on them.  At the bottom you will see the @@ -305,42 +324,36 @@ give you the correct information to do this.  Assuming you know all your devices and their sizes things are easy.  For a linear raid the table would look like this (note all values are in -512-byte sectors): +512-byte sectors):: ---- cut here --- -# Offset into	Size of this	Raid type	Device		Start sector -# volume	device						of device -0		1028161		linear		/dev/hda1	0 -1028161		3903762		linear		/dev/hdb2	0 -4931923		2103211		linear		/dev/hdc1	0 ---- cut here --- +    # Offset into	Size of this	Raid type	Device		Start sector +    # volume	device						of device +    0		1028161		linear		/dev/hda1	0 +    1028161		3903762		linear		/dev/hdb2	0 +    4931923		2103211		linear		/dev/hdc1	0  For a striped volume, i.e. raid level 0, you will need to know the chunk size  you used when creating the volume.  Windows uses 64kiB as the default, so it  will probably be this unless you changes the defaults when creating the array.  For a raid level 0 the table would look like this (note all values are in -512-byte sectors): +512-byte sectors):: ---- cut here --- -# Offset   Size	    Raid     Number   Chunk  1st        Start	2nd	  Start -# into     of the   type     of	      size   Device	in	Device	  in -# volume   volume	     stripes			device		  device -0	   2056320  striped  2	      128    /dev/hda1	0	/dev/hdb1 0 ---- cut here --- +    # Offset   Size	    Raid     Number   Chunk  1st        Start	2nd	  Start +    # into     of the   type     of	      size   Device	in	Device	  in +    # volume   volume	     stripes			device		  device +    0	   2056320  striped  2	      128    /dev/hda1	0	/dev/hdb1 0  If there are more than two devices, just add each of them to the end of the  line.  Finally, for a mirrored volume, i.e. raid level 1, the table would look like -this (note all values are in 512-byte sectors): +this (note all values are in 512-byte sectors):: ---- cut here --- -# Ofs Size   Raid   Log  Number Region Should Number Source  Start Target Start -# in  of the type   type of log size   sync?  of     Device  in    Device in -# vol volume		 params		     mirrors	     Device	  Device -0    2056320 mirror core 2	16     nosync 2	   /dev/hda1 0   /dev/hdb1 0 ---- cut here --- +    # Ofs Size   Raid   Log  Number Region Should Number Source  Start Target Start +    # in  of the type   type of log size   sync?  of     Device  in    Device in +    # vol volume		 params		     mirrors	     Device	  Device +    0    2056320 mirror core 2	16     nosync 2	   /dev/hda1 0   /dev/hdb1 0  If you are mirroring to multiple devices you can specify further targets at the  end of the line. @@ -353,17 +366,17 @@ to the "Target Device" or if you specified multiple target devices to all of  them.  Once you have your table, save it in a file somewhere (e.g. /etc/ntfsvolume1), -and hand it over to dmsetup to work with, like so: +and hand it over to dmsetup to work with, like so:: -$ dmsetup create myvolume1 /etc/ntfsvolume1 +    $ dmsetup create myvolume1 /etc/ntfsvolume1  You can obviously replace "myvolume1" with whatever name you like.  If it all worked, you will now have the device /dev/device-mapper/myvolume1  which you can then just use as an argument to the mount command as usual to -mount the ntfs volume.  For example: +mount the ntfs volume.  For example:: -$ mount -t ntfs -o ro /dev/device-mapper/myvolume1 /mnt/myvol1 +    $ mount -t ntfs -o ro /dev/device-mapper/myvolume1 /mnt/myvol1  (You need to create the directory /mnt/myvol1 first and of course you can use  anything you like instead of /mnt/myvol1 as long as it is an existing @@ -395,18 +408,18 @@ Windows by default uses a stripe chunk size of 64k, so you probably want the  "chunk-size 64k" option for each raid-disk, too.  For example, if you have a stripe set consisting of two partitions /dev/hda5 -and /dev/hdb1 your /etc/raidtab would look like this: - -raiddev /dev/md0 -	raid-level	0 -	nr-raid-disks	2 -	nr-spare-disks	0 -	persistent-superblock	0 -	chunk-size	64k -	device		/dev/hda5 -	raid-disk	0 -	device		/dev/hdb1 -	raid-disk	1 +and /dev/hdb1 your /etc/raidtab would look like this:: + +    raiddev /dev/md0 +	    raid-level	0 +	    nr-raid-disks	2 +	    nr-spare-disks	0 +	    persistent-superblock	0 +	    chunk-size	64k +	    device		/dev/hda5 +	    raid-disk	0 +	    device		/dev/hdb1 +	    raid-disk	1  For linear raid, just change the raid-level above to "raid-level linear", for  mirrors, change it to "raid-level 1", and for stripe sets with parity, change @@ -427,7 +440,9 @@ Once the raidtab is setup, run for example raid0run -a to start all devices or  raid0run /dev/md0 to start a particular md device, in this case /dev/md0.  Then just use the mount command as usual to mount the ntfs volume using for -example:	mount -t ntfs -o ro /dev/md0 /mnt/myntfsvolume +example:: + +    mount -t ntfs -o ro /dev/md0 /mnt/myntfsvolume  It is advisable to do the mount read-only to see if the md volume has been  setup correctly to avoid the possibility of causing damage to the data on the diff --git a/Documentation/filesystems/ocfs2-online-filecheck.txt b/Documentation/filesystems/ocfs2-online-filecheck.rst index 139fab175c8a..2257bb53edc1 100644 --- a/Documentation/filesystems/ocfs2-online-filecheck.txt +++ b/Documentation/filesystems/ocfs2-online-filecheck.rst @@ -1,5 +1,8 @@ -		    OCFS2 online file check -		    ----------------------- +.. SPDX-License-Identifier: GPL-2.0 + +===================================== +OCFS2 file system - online file check +=====================================  This document will describe OCFS2 online file check feature. @@ -40,7 +43,7 @@ When there are errors in the OCFS2 filesystem, they are usually accompanied  by the inode number which caused the error. This inode number would be the  input to check/fix the file. -There is a sysfs directory for each OCFS2 file system mounting: +There is a sysfs directory for each OCFS2 file system mounting::    /sys/fs/ocfs2/<devname>/filecheck @@ -50,34 +53,36 @@ communicate with kernel space, tell which file(inode number) will be checked or  fixed. Currently, three operations are supported, which includes checking  inode, fixing inode and setting the size of result record history. -1. If you want to know what error exactly happened to <inode> before fixing, do +1. If you want to know what error exactly happened to <inode> before fixing, do:: + +    # echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/check +    # cat /sys/fs/ocfs2/<devname>/filecheck/check + +The output is like this:: -  # echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/check -  # cat /sys/fs/ocfs2/<devname>/filecheck/check +    INO		DONE	ERROR +    39502		1	GENERATION -The output is like this: -  INO		DONE	ERROR -39502		1	GENERATION +    <INO> lists the inode numbers. +    <DONE> indicates whether the operation has been finished. +    <ERROR> says what kind of errors was found. For the detailed error numbers, +    please refer to the file linux/fs/ocfs2/filecheck.h. -<INO> lists the inode numbers. -<DONE> indicates whether the operation has been finished. -<ERROR> says what kind of errors was found. For the detailed error numbers, -please refer to the file linux/fs/ocfs2/filecheck.h. +2. If you determine to fix this inode, do:: -2. If you determine to fix this inode, do +    # echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/fix +    # cat /sys/fs/ocfs2/<devname>/filecheck/fix -  # echo "<inode>" > /sys/fs/ocfs2/<devname>/filecheck/fix -  # cat /sys/fs/ocfs2/<devname>/filecheck/fix +The output is like this::: -The output is like this: -  INO		DONE	ERROR -39502		1	SUCCESS +    INO		DONE	ERROR +    39502		1	SUCCESS  This time, the <ERROR> column indicates whether this fix is successful or not.  3. The record cache is used to store the history of check/fix results. It's  default size is 10, and can be adjust between the range of 10 ~ 100. You can -adjust the size like this: +adjust the size like this::    # echo "<size>" > /sys/fs/ocfs2/<devname>/filecheck/set diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.rst index 4c49e5410595..412386bc6506 100644 --- a/Documentation/filesystems/ocfs2.txt +++ b/Documentation/filesystems/ocfs2.rst @@ -1,5 +1,9 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================  OCFS2 filesystem -================== +================ +  OCFS2 is a general purpose extent based shared disk cluster file  system with many similarities to ext3. It supports 64 bit inode  numbers, and has automatically extending metadata groups which may @@ -14,22 +18,26 @@ OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/  All code copyright 2005 Oracle except when otherwise noted. -CREDITS: +Credits +======= +  Lots of code taken from ext3 and other projects.  Authors in alphabetical order: -Joel Becker   <[email protected]> -Zach Brown    <[email protected]> -Mark Fasheh   <[email protected]> -Kurt Hackel   <[email protected]> -Tao Ma        <[email protected]> -Sunil Mushran <[email protected]> -Manish Singh  <[email protected]> -Tiger Yang    <[email protected]> + +- Joel Becker   <[email protected]> +- Zach Brown    <[email protected]> +- Mark Fasheh   <[email protected]> +- Kurt Hackel   <[email protected]> +- Tao Ma        <[email protected]> +- Sunil Mushran <[email protected]> +- Manish Singh  <[email protected]> +- Tiger Yang    <[email protected]>  Caveats  =======  Features which OCFS2 does not support yet: +  	- Directory change notification (F_NOTIFY)  	- Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease) @@ -37,8 +45,10 @@ Mount options  =============  OCFS2 supports the following mount options: +  (*) == default +======================= ========================================================  barrier=1		This enables/disables barriers. barrier=0 disables it,  			barrier=1 enables it.  errors=remount-ro(*)	Remount the filesystem read-only on an error. @@ -104,3 +114,4 @@ journal_async_commit	Commit block can be written to disk without waiting  			for descriptor blocks. If enabled older kernels cannot  			mount the device. This will enable 'journal_checksum'  			internally. +======================= ======================================================== diff --git a/Documentation/filesystems/omfs.rst b/Documentation/filesystems/omfs.rst new file mode 100644 index 000000000000..4c8bb3074169 --- /dev/null +++ b/Documentation/filesystems/omfs.rst @@ -0,0 +1,112 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================================ +Optimized MPEG Filesystem (OMFS) +================================ + +Overview +======== + +OMFS is a filesystem created by SonicBlue for use in the ReplayTV DVR +and Rio Karma MP3 player.  The filesystem is extent-based, utilizing +block sizes from 2k to 8k, with hash-based directories.  This +filesystem driver may be used to read and write disks from these +devices. + +Note, it is not recommended that this FS be used in place of a general +filesystem for your own streaming media device.  Native Linux filesystems +will likely perform better. + +More information is available at: + +    http://linux-karma.sf.net/ + +Various utilities, including mkomfs and omfsck, are included with +omfsprogs, available at: + +    http://bobcopeland.com/karma/ + +Instructions are included in its README. + +Options +======= + +OMFS supports the following mount-time options: + +    ============   ======================================== +    uid=n          make all files owned by specified user +    gid=n          make all files owned by specified group +    umask=xxx      set permission umask to xxx +    fmask=xxx      set umask to xxx for files +    dmask=xxx      set umask to xxx for directories +    ============   ======================================== + +Disk format +=========== + +OMFS discriminates between "sysblocks" and normal data blocks.  The sysblock +group consists of super block information, file metadata, directory structures, +and extents.  Each sysblock has a header containing CRCs of the entire +sysblock, and may be mirrored in successive blocks on the disk.  A sysblock may +have a smaller size than a data block, but since they are both addressed by the +same 64-bit block number, any remaining space in the smaller sysblock is +unused. + +Sysblock header information:: + +    struct omfs_header { +	    __be64 h_self;                  /* FS block where this is located */ +	    __be32 h_body_size;             /* size of useful data after header */ +	    __be16 h_crc;                   /* crc-ccitt of body_size bytes */ +	    char h_fill1[2]; +	    u8 h_version;                   /* version, always 1 */ +	    char h_type;                    /* OMFS_INODE_X */ +	    u8 h_magic;                     /* OMFS_IMAGIC */ +	    u8 h_check_xor;                 /* XOR of header bytes before this */ +	    __be32 h_fill2; +    }; + +Files and directories are both represented by omfs_inode:: + +    struct omfs_inode { +	    struct omfs_header i_head;      /* header */ +	    __be64 i_parent;                /* parent containing this inode */ +	    __be64 i_sibling;               /* next inode in hash bucket */ +	    __be64 i_ctime;                 /* ctime, in milliseconds */ +	    char i_fill1[35]; +	    char i_type;                    /* OMFS_[DIR,FILE] */ +	    __be32 i_fill2; +	    char i_fill3[64]; +	    char i_name[OMFS_NAMELEN];      /* filename */ +	    __be64 i_size;                  /* size of file, in bytes */ +    }; + +Directories in OMFS are implemented as a large hash table.  Filenames are +hashed then prepended into the bucket list beginning at OMFS_DIR_START. +Lookup requires hashing the filename, then seeking across i_sibling pointers +until a match is found on i_name.  Empty buckets are represented by block +pointers with all-1s (~0). + +A file is an omfs_inode structure followed by an extent table beginning at +OMFS_EXTENT_START:: + +    struct omfs_extent_entry { +	    __be64 e_cluster;               /* start location of a set of blocks */ +	    __be64 e_blocks;                /* number of blocks after e_cluster */ +    }; + +    struct omfs_extent { +	    __be64 e_next;                  /* next extent table location */ +	    __be32 e_extent_count;          /* total # extents in this table */ +	    __be32 e_fill; +	    struct omfs_extent_entry e_entry;       /* start of extent entries */ +    }; + +Each extent holds the block offset followed by number of blocks allocated to +the extent.  The final extent in each table is a terminator with e_cluster +being ~0 and e_blocks being ones'-complement of the total number of blocks +in the table. + +If this table overflows, a continuation inode is written and pointed to by +e_next.  These have a header but lack the rest of the inode structure. + diff --git a/Documentation/filesystems/omfs.txt b/Documentation/filesystems/omfs.txt deleted file mode 100644 index 1d0d41ff5c65..000000000000 --- a/Documentation/filesystems/omfs.txt +++ /dev/null @@ -1,106 +0,0 @@ -Optimized MPEG Filesystem (OMFS) - -Overview -======== - -OMFS is a filesystem created by SonicBlue for use in the ReplayTV DVR -and Rio Karma MP3 player.  The filesystem is extent-based, utilizing -block sizes from 2k to 8k, with hash-based directories.  This -filesystem driver may be used to read and write disks from these -devices. - -Note, it is not recommended that this FS be used in place of a general -filesystem for your own streaming media device.  Native Linux filesystems -will likely perform better. - -More information is available at: - -    http://linux-karma.sf.net/ - -Various utilities, including mkomfs and omfsck, are included with -omfsprogs, available at: - -    http://bobcopeland.com/karma/ - -Instructions are included in its README. - -Options -======= - -OMFS supports the following mount-time options: - -    uid=n        - make all files owned by specified user -    gid=n        - make all files owned by specified group -    umask=xxx    - set permission umask to xxx -    fmask=xxx    - set umask to xxx for files -    dmask=xxx    - set umask to xxx for directories - -Disk format -=========== - -OMFS discriminates between "sysblocks" and normal data blocks.  The sysblock -group consists of super block information, file metadata, directory structures, -and extents.  Each sysblock has a header containing CRCs of the entire -sysblock, and may be mirrored in successive blocks on the disk.  A sysblock may -have a smaller size than a data block, but since they are both addressed by the -same 64-bit block number, any remaining space in the smaller sysblock is -unused. - -Sysblock header information: - -struct omfs_header { -        __be64 h_self;                  /* FS block where this is located */ -        __be32 h_body_size;             /* size of useful data after header */ -        __be16 h_crc;                   /* crc-ccitt of body_size bytes */ -        char h_fill1[2]; -        u8 h_version;                   /* version, always 1 */ -        char h_type;                    /* OMFS_INODE_X */ -        u8 h_magic;                     /* OMFS_IMAGIC */ -        u8 h_check_xor;                 /* XOR of header bytes before this */ -        __be32 h_fill2; -}; - -Files and directories are both represented by omfs_inode: - -struct omfs_inode { -        struct omfs_header i_head;      /* header */ -        __be64 i_parent;                /* parent containing this inode */ -        __be64 i_sibling;               /* next inode in hash bucket */ -        __be64 i_ctime;                 /* ctime, in milliseconds */ -        char i_fill1[35]; -        char i_type;                    /* OMFS_[DIR,FILE] */ -        __be32 i_fill2; -        char i_fill3[64]; -        char i_name[OMFS_NAMELEN];      /* filename */ -        __be64 i_size;                  /* size of file, in bytes */ -}; - -Directories in OMFS are implemented as a large hash table.  Filenames are -hashed then prepended into the bucket list beginning at OMFS_DIR_START. -Lookup requires hashing the filename, then seeking across i_sibling pointers -until a match is found on i_name.  Empty buckets are represented by block -pointers with all-1s (~0). - -A file is an omfs_inode structure followed by an extent table beginning at -OMFS_EXTENT_START: - -struct omfs_extent_entry { -        __be64 e_cluster;               /* start location of a set of blocks */ -        __be64 e_blocks;                /* number of blocks after e_cluster */ -}; - -struct omfs_extent { -        __be64 e_next;                  /* next extent table location */ -        __be32 e_extent_count;          /* total # extents in this table */ -        __be32 e_fill; -        struct omfs_extent_entry e_entry;       /* start of extent entries */ -}; - -Each extent holds the block offset followed by number of blocks allocated to -the extent.  The final extent in each table is a terminator with e_cluster -being ~0 and e_blocks being ones'-complement of the total number of blocks -in the table. - -If this table overflows, a continuation inode is written and pointed to by -e_next.  These have a header but lack the rest of the inode structure. - diff --git a/Documentation/filesystems/orangefs.txt b/Documentation/filesystems/orangefs.rst index f4ba94950e3f..e41369709c5b 100644 --- a/Documentation/filesystems/orangefs.txt +++ b/Documentation/filesystems/orangefs.rst @@ -1,3 +1,6 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========  ORANGEFS  ======== @@ -21,43 +24,33 @@ Orangefs features include:    * Stateless -MAILING LIST ARCHIVES +Mailing List Archives  =====================  http://lists.orangefs.org/pipermail/devel_lists.orangefs.org/ -MAILING LIST SUBMISSIONS +Mailing List Submissions  ======================== -DOCUMENTATION +Documentation  =============  http://www.orangefs.org/documentation/ - -USERSPACE FILESYSTEM SOURCE -=========================== - -http://www.orangefs.org/download - -Orangefs versions prior to 2.9.3 would not be compatible with the -upstream version of the kernel client. - - -RUNNING ORANGEFS ON A SINGLE SERVER +Running ORANGEFS On a Single Server  ===================================  OrangeFS is usually run in large installations with multiple servers and  clients, but a complete filesystem can be run on a single machine for  development and testing. -On Fedora, install orangefs and orangefs-server. +On Fedora, install orangefs and orangefs-server:: -dnf -y install orangefs orangefs-server +    dnf -y install orangefs orangefs-server  There is an example server configuration file in  /etc/orangefs/orangefs.conf.  Change localhost to your hostname if @@ -70,29 +63,37 @@ single line.  Uncomment it and change the hostname if necessary.  This  controls clients which use libpvfs2.  This does not control the  pvfs2-client-core. -Create the filesystem. +Create the filesystem:: -pvfs2-server -f /etc/orangefs/orangefs.conf +    pvfs2-server -f /etc/orangefs/orangefs.conf -Start the server. +Start the server:: -systemctl start orangefs-server +    systemctl start orangefs-server -Test the server. +Test the server:: -pvfs2-ping -m /pvfsmnt +    pvfs2-ping -m /pvfsmnt  Start the client.  The module must be compiled in or loaded before this -point. +point:: -systemctl start orangefs-client +    systemctl start orangefs-client -Mount the filesystem. +Mount the filesystem:: -mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt +    mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt +Userspace Filesystem Source +=========================== -BUILDING ORANGEFS ON A SINGLE SERVER +http://www.orangefs.org/download + +Orangefs versions prior to 2.9.3 would not be compatible with the +upstream version of the kernel client. + + +Building ORANGEFS on a Single Server  ====================================  Where OrangeFS cannot be installed from distribution packages, it may be @@ -102,49 +103,57 @@ You can omit --prefix if you don't care that things are sprinkled around  in /usr/local.  As of version 2.9.6, OrangeFS uses Berkeley DB by  default, we will probably be changing the default to LMDB soon. -./configure --prefix=/opt/ofs --with-db-backend=lmdb +:: + +    ./configure --prefix=/opt/ofs --with-db-backend=lmdb --disable-usrint -make +    make -make install +    make install -Create an orangefs config file. +Create an orangefs config file by running pvfs2-genconfig and +specifying a target config file. Pvfs2-genconfig will prompt you +through. Generally it works fine to take the defaults, but you +should use your server's hostname, rather than "localhost" when +it comes to that question:: -/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf +    /opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf -Create an /etc/pvfs2tab file. +Create an /etc/pvfs2tab file:: -echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \ -    /etc/pvfs2tab +Localhost is fine for your pvfs2tab file: -Create the mount point you specified in the tab file if needed. +    echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \ +	/etc/pvfs2tab -mkdir /pvfsmnt +Create the mount point you specified in the tab file if needed:: -Bootstrap the server. +    mkdir /pvfsmnt -/opt/ofs/sbin/pvfs2-server -f /etc/pvfs2.conf +Bootstrap the server:: -Start the server. +    /opt/ofs/sbin/pvfs2-server -f /etc/pvfs2.conf -/opt/osf/sbin/pvfs2-server /etc/pvfs2.conf +Start the server:: + +    /opt/ofs/sbin/pvfs2-server /etc/pvfs2.conf  Now the server should be running. Pvfs2-ls is a simple -test to verify that the server is running. +test to verify that the server is running:: -/opt/ofs/bin/pvfs2-ls /pvfsmnt +    /opt/ofs/bin/pvfs2-ls /pvfsmnt  If stuff seems to be working, load the kernel module and -turn on the client core. +turn on the client core:: -/opt/ofs/sbin/pvfs2-client -p /opt/osf/sbin/pvfs2-client-core +    /opt/ofs/sbin/pvfs2-client -p /opt/ofs/sbin/pvfs2-client-core -Mount your filesystem. +Mount your filesystem:: -mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt +    mount -t pvfs2 tcp://`hostname`:3334/orangefs /pvfsmnt -RUNNING XFSTESTS +Running xfstests  ================  It is useful to use a scratch filesystem with xfstests.  This can be @@ -159,21 +168,23 @@ Then there are two FileSystem sections: orangefs and scratch.  This change should be made before creating the filesystem. -pvfs2-server -f /etc/orangefs/orangefs.conf +:: + +    pvfs2-server -f /etc/orangefs/orangefs.conf -To run xfstests, create /etc/xfsqa.config. +To run xfstests, create /etc/xfsqa.config:: -TEST_DIR=/orangefs -TEST_DEV=tcp://localhost:3334/orangefs -SCRATCH_MNT=/scratch -SCRATCH_DEV=tcp://localhost:3334/scratch +    TEST_DIR=/orangefs +    TEST_DEV=tcp://localhost:3334/orangefs +    SCRATCH_MNT=/scratch +    SCRATCH_DEV=tcp://localhost:3334/scratch -Then xfstests can be run +Then xfstests can be run:: -./check -pvfs2 +    ./check -pvfs2 -OPTIONS +Options  =======  The following mount options are accepted: @@ -193,32 +204,32 @@ The following mount options are accepted:      Distributed locking is being worked on for the future. -DEBUGGING +Debugging  =========  If you want the debug (GOSSIP) statements in a particular -source file (inode.c for example) go to syslog: +source file (inode.c for example) go to syslog::    echo inode > /sys/kernel/debug/orangefs/kernel-debug -No debugging (the default): +No debugging (the default)::    echo none > /sys/kernel/debug/orangefs/kernel-debug -Debugging from several source files: +Debugging from several source files::    echo inode,dir > /sys/kernel/debug/orangefs/kernel-debug -All debugging: +All debugging::    echo all > /sys/kernel/debug/orangefs/kernel-debug -Get a list of all debugging keywords: +Get a list of all debugging keywords::    cat /sys/kernel/debug/orangefs/debug-help -PROTOCOL BETWEEN KERNEL MODULE AND USERSPACE +Protocol between Kernel Module and Userspace  ============================================  Orangefs is a user space filesystem and an associated kernel module. @@ -234,7 +245,8 @@ The kernel module implements a pseudo device that userspace  can read from and write to. Userspace can also manipulate the  kernel module through the pseudo device with ioctl. -THE BUFMAP: +The Bufmap +----------  At startup userspace allocates two page-size-aligned (posix_memalign)  mlocked memory buffers, one is used for IO and one is used for readdir @@ -250,7 +262,8 @@ copied from user space to kernel space with copy_from_user and is used  to initialize the kernel module's "bufmap" (struct orangefs_bufmap), which  then contains: -  * refcnt - a reference counter +  * refcnt +    - a reference counter    * desc_size - PVFS2_BUFMAP_DEFAULT_DESC_SIZE (4194304) - the IO buffer's      partition size, which represents the filesystem's block size and      is used for s_blocksize in super blocks. @@ -259,17 +272,19 @@ then contains:    * desc_shift - log2(desc_size), used for s_blocksize_bits in super blocks.    * total_size - the total size of the IO buffer.    * page_count - the number of 4096 byte pages in the IO buffer. -  * page_array - a pointer to page_count * (sizeof(struct page*)) bytes +  * page_array - a pointer to ``page_count * (sizeof(struct page*))`` bytes      of kcalloced memory. This memory is used as an array of pointers      to each of the pages in the IO buffer through a call to get_user_pages. -  * desc_array - a pointer to desc_count * (sizeof(struct orangefs_bufmap_desc)) +  * desc_array - a pointer to ``desc_count * (sizeof(struct orangefs_bufmap_desc))``      bytes of kcalloced memory. This memory is further intialized:        user_desc is the kernel's copy of the IO buffer's ORANGEFS_dev_map_desc        structure. user_desc->ptr points to the IO buffer. -      pages_per_desc = bufmap->desc_size / PAGE_SIZE -      offset = 0 +      :: + +	pages_per_desc = bufmap->desc_size / PAGE_SIZE +	offset = 0          bufmap->desc_array[0].page_array = &bufmap->page_array[offset]          bufmap->desc_array[0].array_count = pages_per_desc = 1024 @@ -293,7 +308,8 @@ then contains:    * readdir_index_lock - a spinlock to protect readdir_index_array during      update. -OPERATIONS: +Operations +----------  The kernel module builds an "op" (struct orangefs_kernel_op_s) when it  needs to communicate with userspace. Part of the op contains the "upcall" @@ -308,13 +324,19 @@ in flight at any given time.  Ops are stateful: - * unknown  - op was just initialized - * waiting  - op is on request_list (upward bound) - * inprogr  - op is in progress (waiting for downcall) - * serviced - op has matching downcall; ok - * purged   - op has to start a timer since client-core + * unknown +	    - op was just initialized + * waiting +	    - op is on request_list (upward bound) + * inprogr +	    - op is in progress (waiting for downcall) + * serviced +	    - op has matching downcall; ok + * purged +	    - op has to start a timer since client-core                exited uncleanly before servicing op - * given up - submitter has given up waiting for it + * given up +	    - submitter has given up waiting for it  When some arbitrary userspace program needs to perform a  filesystem operation on Orangefs (readdir, I/O, create, whatever) @@ -389,10 +411,15 @@ union of structs, each of which is associated with a particular  response type.  The several members outside of the union are: - - int32_t type - type of operation. - - int32_t status - return code for the operation. - - int64_t trailer_size - 0 unless readdir operation. - - char *trailer_buf - initialized to NULL, used during readdir operations. + + ``int32_t type`` +    - type of operation. + ``int32_t status`` +    - return code for the operation. + ``int64_t trailer_size`` +    - 0 unless readdir operation. + ``char *trailer_buf`` +    - initialized to NULL, used during readdir operations.  The appropriate member inside the union is filled out for any  particular response. @@ -449,18 +476,20 @@ Userspace uses writev() on /dev/pvfs2-req to pass responses to the requests  made by the kernel side.  A buffer_list containing: +    - a pointer to the prepared response to the request from the      kernel (struct pvfs2_downcall_t).    - and also, in the case of a readdir request, a pointer to a      buffer containing descriptors for the objects in the target      directory. +  ... is sent to the function (PINT_dev_write_list) which performs  the writev.  PINT_dev_write_list has a local iovec array: struct iovec io_array[10];  The first four elements of io_array are initialized like this for all -responses: +responses::    io_array[0].iov_base = address of local variable "proto_ver" (int32_t)    io_array[0].iov_len = sizeof(int32_t) @@ -475,7 +504,7 @@ responses:                           of global variable vfs_request (vfs_request_t)    io_array[3].iov_len = sizeof(pvfs2_downcall_t) -Readdir responses initialize the fifth element io_array like this: +Readdir responses initialize the fifth element io_array like this::    io_array[4].iov_base = contents of member trailer_buf (char *)                           from out_downcall member of global variable @@ -517,13 +546,13 @@ from a dentry is cheap, obtaining it from userspace is relatively expensive,  hence the motivation to use the dentry when possible.  The timeout values d_time and getattr_time are jiffy based, and the -code is designed to avoid the jiffy-wrap problem: +code is designed to avoid the jiffy-wrap problem:: -"In general, if the clock may have wrapped around more than once, there -is no way to tell how much time has elapsed. However, if the times t1 -and t2 are known to be fairly close, we can reliably compute the -difference in a way that takes into account the possibility that the -clock may have wrapped between times." +    "In general, if the clock may have wrapped around more than once, there +    is no way to tell how much time has elapsed. However, if the times t1 +    and t2 are known to be fairly close, we can reliably compute the +    difference in a way that takes into account the possibility that the +    clock may have wrapped between times." -                      from course notes by instructor Andy Wang +from course notes by instructor Andy Wang diff --git a/Documentation/filesystems/overlayfs.rst b/Documentation/filesystems/overlayfs.rst index e443be7928db..c9d2bf96b02d 100644 --- a/Documentation/filesystems/overlayfs.rst +++ b/Documentation/filesystems/overlayfs.rst @@ -40,13 +40,46 @@ On 64bit systems, even if all overlay layers are not on the same  underlying filesystem, the same compliant behavior could be achieved  with the "xino" feature.  The "xino" feature composes a unique object  identifier from the real object st_ino and an underlying fsid index. +  If all underlying filesystems support NFS file handles and export file  handles with 32bit inode number encoding (e.g. ext4), overlay filesystem  will use the high inode number bits for fsid.  Even when the underlying  filesystem uses 64bit inode numbers, users can still enable the "xino"  feature with the "-o xino=on" overlay mount option.  That is useful for the  case of underlying filesystems like xfs and tmpfs, which use 64bit inode -numbers, but are very unlikely to use the high inode number bit. +numbers, but are very unlikely to use the high inode number bits.  In case +the underlying inode number does overflow into the high xino bits, overlay +filesystem will fall back to the non xino behavior for that inode. + +The following table summarizes what can be expected in different overlay +configurations. + +Inode properties +```````````````` + ++--------------+------------+------------+-----------------+----------------+ +|Configuration | Persistent | Uniform    | st_ino == d_ino | d_ino == i_ino | +|              | st_ino     | st_dev     |                 | [*]            | ++==============+=====+======+=====+======+========+========+========+=======+ +|              | dir | !dir | dir | !dir |  dir   +  !dir  |  dir   | !dir  | ++--------------+-----+------+-----+------+--------+--------+--------+-------+ +| All layers   |  Y  |  Y   |  Y  |  Y   |  Y     |   Y    |  Y     |  Y    | +| on same fs   |     |      |     |      |        |        |        |       | ++--------------+-----+------+-----+------+--------+--------+--------+-------+ +| Layers not   |  N  |  Y   |  Y  |  N   |  N     |   Y    |  N     |  Y    | +| on same fs,  |     |      |     |      |        |        |        |       | +| xino=off     |     |      |     |      |        |        |        |       | ++--------------+-----+------+-----+------+--------+--------+--------+-------+ +| xino=on/auto |  Y  |  Y   |  Y  |  Y   |  Y     |   Y    |  Y     |  Y    | +|              |     |      |     |      |        |        |        |       | ++--------------+-----+------+-----+------+--------+--------+--------+-------+ +| xino=on/auto,|  N  |  Y   |  Y  |  N   |  N     |   Y    |  N     |  Y    | +| ino overflow |     |      |     |      |        |        |        |       | ++--------------+-----+------+-----+------+--------+--------+--------+-------+ + +[*] nfsd v3 readdirplus verifies d_ino == i_ino. i_ino is exposed via several +/proc files, such as /proc/locks and /proc/self/fdinfo/<fd> of an inotify +file descriptor.  Upper and Lower @@ -248,6 +281,50 @@ overlay filesystem (though an operation on the name of the file such as  rename or unlink will of course be noticed and handled). +Permission model +---------------- + +Permission checking in the overlay filesystem follows these principles: + + 1) permission check SHOULD return the same result before and after copy up + + 2) task creating the overlay mount MUST NOT gain additional privileges + + 3) non-mounting task MAY gain additional privileges through the overlay, + compared to direct access on underlying lower or upper filesystems + +This is achieved by performing two permission checks on each access + + a) check if current task is allowed access based on local DAC (owner, +    group, mode and posix acl), as well as MAC checks + + b) check if mounting task would be allowed real operation on lower or +    upper layer based on underlying filesystem permissions, again including +    MAC checks + +Check (a) ensures consistency (1) since owner, group, mode and posix acls +are copied up.  On the other hand it can result in server enforced +permissions (used by NFS, for example) being ignored (3). + +Check (b) ensures that no task gains permissions to underlying layers that +the mounting task does not have (2).  This also means that it is possible +to create setups where the consistency rule (1) does not hold; normally, +however, the mounting task will have sufficient privileges to perform all +operations. + +Another way to demonstrate this model is drawing parallels between + +  mount -t overlay overlay -olowerdir=/lower,upperdir=/upper,... /merged + +and + +  cp -a /lower /upper +  mount --bind /upper /merged + +The resulting access permissions should be the same.  The difference is in +the time of copy (on-demand vs. up-front). + +  Multiple lower layers  --------------------- @@ -383,7 +460,8 @@ guarantee that the values of st_ino and st_dev returned by stat(2) and the  value of d_ino returned by readdir(3) will act like on a normal filesystem.  E.g. the value of st_dev may be different for two objects in the same  overlay filesystem and the value of st_ino for directory objects may not be -persistent and could change even while the overlay filesystem is mounted. +persistent and could change even while the overlay filesystem is mounted, as +summarized in the `Inode properties`_ table above.  Changes to underlying filesystems diff --git a/Documentation/filesystems/path-lookup.rst b/Documentation/filesystems/path-lookup.rst index a3216979298b..f46b05e9b96c 100644 --- a/Documentation/filesystems/path-lookup.rst +++ b/Documentation/filesystems/path-lookup.rst @@ -404,11 +404,8 @@ that is the "next" component in the pathname.  ``int last_type``  ~~~~~~~~~~~~~~~~~ -This is one of ``LAST_NORM``, ``LAST_ROOT``, ``LAST_DOT``, ``LAST_DOTDOT``, or -``LAST_BIND``.  The ``last`` field is only valid if the type is -``LAST_NORM``.  ``LAST_BIND`` is used when following a symlink and no -components of the symlink have been processed yet.  Others should be -fairly self-explanatory. +This is one of ``LAST_NORM``, ``LAST_ROOT``, ``LAST_DOT`` or ``LAST_DOTDOT``. +The ``last`` field is only valid if the type is ``LAST_NORM``.  ``struct path root``  ~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.rst index 99ca040e3f90..38b606991065 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.rst @@ -1,19 +1,20 @@ ------------------------------------------------------------------------------- -                       T H E  /proc   F I L E S Y S T E M ------------------------------------------------------------------------------- -/proc/sys         Terrehon Bowden <[email protected]>        October 7 1999 -                  Bodo Bauer <[email protected]> +.. SPDX-License-Identifier: GPL-2.0 + +==================== +The /proc Filesystem +==================== + +=====================  =======================================  ================ +/proc/sys              Terrehon Bowden <[email protected]>,  October 7 1999 +                       Bodo Bauer <[email protected]> +2.4.x update	       Jorge Nerin <[email protected]>   November 14 2000 +move /proc/sys	       Shen Feng <[email protected]>	        April 1 2009 +fixes/update part 1.1  Stefani Seibold <[email protected]>    June 9 2009 +=====================  =======================================  ================ + -2.4.x update	  Jorge Nerin <[email protected]>      November 14 2000 -move /proc/sys	  Shen Feng <[email protected]>		  April 1 2009 ------------------------------------------------------------------------------- -Version 1.3                                              Kernel version 2.2.12 -					      Kernel version 2.4.0-test11-pre4 ------------------------------------------------------------------------------- -fixes/update part 1.1  Stefani Seibold <[email protected]>       June 9 2009 -Table of Contents ------------------ +.. Table of Contents    0     Preface    0.1	Introduction/Credits @@ -50,9 +51,8 @@ Table of Contents    4	Configuring procfs    4.1	Mount options -------------------------------------------------------------------------------  Preface ------------------------------------------------------------------------------- +=======  0.1 Introduction/Credits  ------------------------ @@ -95,20 +95,18 @@ We don't  guarantee  the  correctness  of this document, and if you come to us  complaining about  how  you  screwed  up  your  system  because  of  incorrect  documentation, we won't feel responsible... ------------------------------------------------------------------------------- -CHAPTER 1: COLLECTING SYSTEM INFORMATION ------------------------------------------------------------------------------- +Chapter 1: Collecting System Information +======================================== -------------------------------------------------------------------------------  In This Chapter ------------------------------------------------------------------------------- +---------------  * Investigating  the  properties  of  the  pseudo  file  system  /proc and its    ability to provide information on the running Linux system  * Examining /proc's structure  * Uncovering  various  information  about the kernel and the processes running    on the system ------------------------------------------------------------------------------- +------------------------------------------------------------------------------  The proc  file  system acts as an interface to internal data structures in the  kernel. It  can  be  used to obtain information about the system and to change @@ -134,9 +132,11 @@ never act on any new process that the kernel may, through chance, have  also assigned the process ID <pid>. Instead, operations on these FDs  usually fail with ESRCH. -Table 1-1: Process specific entries in /proc -.............................................................................. +.. table:: Table 1-1: Process specific entries in /proc + + =============  ===============================================================   File		Content + =============  ===============================================================   clear_refs	Clears page referenced bits shown in smaps output   cmdline	Command line arguments   cpu		Current and last cpu in which it was executed	(2.4)(smp) @@ -160,10 +160,10 @@ Table 1-1: Process specific entries in /proc  		can be derived from smaps, but is faster and more convenient   numa_maps	An extension based on maps, showing the memory locality and  		binding policy as well as mem usage (in pages) of each mapping. -.............................................................................. + =============  ===============================================================  For example, to get the status information of a process, all you have to do is -read the file /proc/PID/status: +read the file /proc/PID/status::    >cat /proc/self/status    Name:   cat @@ -222,14 +222,17 @@ contains details information about the process itself.  Its fields are  explained in Table 1-4.  (for SMP CONFIG users) +  For making accounting scalable, RSS related information are handled in an  asynchronous manner and the value may not be very precise. To see a precise  snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.  It's slow but very precise. -Table 1-2: Contents of the status files (as of 4.19) -.............................................................................. +.. table:: Table 1-2: Contents of the status files (as of 4.19) + + ==========================  ===================================================   Field                       Content + ==========================  ===================================================   Name                        filename of the executable   Umask                       file mode creation mask   State                       state (R is running, S is sleeping, D is sleeping @@ -254,7 +257,8 @@ Table 1-2: Contents of the status files (as of 4.19)   VmPin                       pinned memory size   VmHWM                       peak resident set size ("high water mark")   VmRSS                       size of memory portions. It contains the three -                             following parts (VmRSS = RssAnon + RssFile + RssShmem) +                             following parts +                             (VmRSS = RssAnon + RssFile + RssShmem)   RssAnon                     size of resident anonymous memory   RssFile                     size of resident file mappings   RssShmem                    size of resident shmem memory (includes SysV shm, @@ -292,27 +296,32 @@ Table 1-2: Contents of the status files (as of 4.19)   Mems_allowed_list           Same as previous, but in "list format"   voluntary_ctxt_switches     number of voluntary context switches   nonvoluntary_ctxt_switches  number of non voluntary context switches -.............................................................................. + ==========================  =================================================== -Table 1-3: Contents of the statm files (as of 2.6.8-rc3) -.............................................................................. + +.. table:: Table 1-3: Contents of the statm files (as of 2.6.8-rc3) + + ======== ===============================	==============================   Field    Content + ======== ===============================	==============================   size     total program size (pages)		(same as VmSize in status)   resident size of memory portions (pages)	(same as VmRSS in status)   shared   number of pages that are shared	(i.e. backed by a file, same  						as RssFile+RssShmem in status)   trs      number of pages that are 'code'	(not including libs; broken, -							includes data segment) +						includes data segment)   lrs      number of pages of library		(always 0 on 2.6)   drs      number of pages of data/stack		(including libs; broken, -							includes library text) +						includes library text)   dt       number of dirty pages			(always 0 on 2.6) -.............................................................................. + ======== ===============================	============================== + +.. table:: Table 1-4: Contents of the stat files (as of 2.6.30-rc7) -Table 1-4: Contents of the stat files (as of 2.6.30-rc7) -.............................................................................. - Field          Content +  ============= =============================================================== +  Field         Content +  ============= ===============================================================    pid           process id    tcomm         filename of the executable    state         state (R is running, S is sleeping, D is sleeping in an @@ -348,7 +357,8 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)    blocked       bitmap of blocked signals    sigign        bitmap of ignored signals    sigcatch      bitmap of caught signals -  0		(place holder, used to be the wchan address, use /proc/PID/wchan instead) +  0		(place holder, used to be the wchan address, +		use /proc/PID/wchan instead)    0             (place holder)    0             (place holder)    exit_signal   signal to send to parent thread on exit @@ -365,39 +375,40 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)    arg_end       address below which program command line is placed    env_start     address above which program environment is placed    env_end       address below which program environment is placed -  exit_code     the thread's exit_code in the form reported by the waitpid system call -.............................................................................. +  exit_code     the thread's exit_code in the form reported by the waitpid +		system call +  ============= ===============================================================  The /proc/PID/maps file contains the currently mapped memory regions and  their access permissions. -The format is: - -address           perms offset  dev   inode      pathname - -08048000-08049000 r-xp 00000000 03:00 8312       /opt/test -08049000-0804a000 rw-p 00001000 03:00 8312       /opt/test -0804a000-0806b000 rw-p 00000000 00:00 0          [heap] -a7cb1000-a7cb2000 ---p 00000000 00:00 0 -a7cb2000-a7eb2000 rw-p 00000000 00:00 0 -a7eb2000-a7eb3000 ---p 00000000 00:00 0 -a7eb3000-a7ed5000 rw-p 00000000 00:00 0 -a7ed5000-a8008000 r-xp 00000000 03:00 4222       /lib/libc.so.6 -a8008000-a800a000 r--p 00133000 03:00 4222       /lib/libc.so.6 -a800a000-a800b000 rw-p 00135000 03:00 4222       /lib/libc.so.6 -a800b000-a800e000 rw-p 00000000 00:00 0 -a800e000-a8022000 r-xp 00000000 03:00 14462      /lib/libpthread.so.0 -a8022000-a8023000 r--p 00013000 03:00 14462      /lib/libpthread.so.0 -a8023000-a8024000 rw-p 00014000 03:00 14462      /lib/libpthread.so.0 -a8024000-a8027000 rw-p 00000000 00:00 0 -a8027000-a8043000 r-xp 00000000 03:00 8317       /lib/ld-linux.so.2 -a8043000-a8044000 r--p 0001b000 03:00 8317       /lib/ld-linux.so.2 -a8044000-a8045000 rw-p 0001c000 03:00 8317       /lib/ld-linux.so.2 -aff35000-aff4a000 rw-p 00000000 00:00 0          [stack] -ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso] +The format is:: + +    address           perms offset  dev   inode      pathname + +    08048000-08049000 r-xp 00000000 03:00 8312       /opt/test +    08049000-0804a000 rw-p 00001000 03:00 8312       /opt/test +    0804a000-0806b000 rw-p 00000000 00:00 0          [heap] +    a7cb1000-a7cb2000 ---p 00000000 00:00 0 +    a7cb2000-a7eb2000 rw-p 00000000 00:00 0 +    a7eb2000-a7eb3000 ---p 00000000 00:00 0 +    a7eb3000-a7ed5000 rw-p 00000000 00:00 0 +    a7ed5000-a8008000 r-xp 00000000 03:00 4222       /lib/libc.so.6 +    a8008000-a800a000 r--p 00133000 03:00 4222       /lib/libc.so.6 +    a800a000-a800b000 rw-p 00135000 03:00 4222       /lib/libc.so.6 +    a800b000-a800e000 rw-p 00000000 00:00 0 +    a800e000-a8022000 r-xp 00000000 03:00 14462      /lib/libpthread.so.0 +    a8022000-a8023000 r--p 00013000 03:00 14462      /lib/libpthread.so.0 +    a8023000-a8024000 rw-p 00014000 03:00 14462      /lib/libpthread.so.0 +    a8024000-a8027000 rw-p 00000000 00:00 0 +    a8027000-a8043000 r-xp 00000000 03:00 8317       /lib/ld-linux.so.2 +    a8043000-a8044000 r--p 0001b000 03:00 8317       /lib/ld-linux.so.2 +    a8044000-a8045000 rw-p 0001c000 03:00 8317       /lib/ld-linux.so.2 +    aff35000-aff4a000 rw-p 00000000 00:00 0          [stack] +    ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso]  where "address" is the address space in the process that it occupies, "perms" -is a set of permissions: +is a set of permissions::   r = read   w = write @@ -411,42 +422,44 @@ with the memory region, as the case would be with BSS (uninitialized data).  The "pathname" shows the name associated file for this mapping.  If the mapping  is not associated with a file: - [heap]                   = the heap of the program - [stack]                  = the stack of the main process - [vdso]                   = the "virtual dynamic shared object", + =======                    ==================================== + [heap]                     the heap of the program + [stack]                    the stack of the main process + [vdso]                     the "virtual dynamic shared object",                              the kernel system call handler + =======                    ====================================   or if empty, the mapping is anonymous.  The /proc/PID/smaps is an extension based on maps, showing the memory  consumption for each of the process's mappings. For each mapping (aka Virtual -Memory Area, or VMA) there is a series of lines such as the following: - -08048000-080bc000 r-xp 00000000 03:02 13130      /bin/bash - -Size:               1084 kB -KernelPageSize:        4 kB -MMUPageSize:           4 kB -Rss:                 892 kB -Pss:                 374 kB -Shared_Clean:        892 kB -Shared_Dirty:          0 kB -Private_Clean:         0 kB -Private_Dirty:         0 kB -Referenced:          892 kB -Anonymous:             0 kB -LazyFree:              0 kB -AnonHugePages:         0 kB -ShmemPmdMapped:        0 kB -Shared_Hugetlb:        0 kB -Private_Hugetlb:       0 kB -Swap:                  0 kB -SwapPss:               0 kB -KernelPageSize:        4 kB -MMUPageSize:           4 kB -Locked:                0 kB -THPeligible:           0 -VmFlags: rd ex mr mw me dw +Memory Area, or VMA) there is a series of lines such as the following:: + +    08048000-080bc000 r-xp 00000000 03:02 13130      /bin/bash + +    Size:               1084 kB +    KernelPageSize:        4 kB +    MMUPageSize:           4 kB +    Rss:                 892 kB +    Pss:                 374 kB +    Shared_Clean:        892 kB +    Shared_Dirty:          0 kB +    Private_Clean:         0 kB +    Private_Dirty:         0 kB +    Referenced:          892 kB +    Anonymous:             0 kB +    LazyFree:              0 kB +    AnonHugePages:         0 kB +    ShmemPmdMapped:        0 kB +    Shared_Hugetlb:        0 kB +    Private_Hugetlb:       0 kB +    Swap:                  0 kB +    SwapPss:               0 kB +    KernelPageSize:        4 kB +    MMUPageSize:           4 kB +    Locked:                0 kB +    THPeligible:           0 +    VmFlags: rd ex mr mw me dw  The first of these lines shows the same information as is displayed for the  mapping in /proc/PID/maps.  Following lines show the size of the mapping @@ -461,26 +474,35 @@ The "proportional set size" (PSS) of a process is the count of pages it has  in memory, where each page is divided by the number of processes sharing it.  So if a process has 1000 pages all to itself, and 1000 shared with one other  process, its PSS will be 1500. +  Note that even a page which is part of a MAP_SHARED mapping, but has only  a single pte mapped, i.e.  is currently used by only one process, is accounted  as private and not as shared. +  "Referenced" indicates the amount of memory currently marked as referenced or  accessed. +  "Anonymous" shows the amount of memory that does not belong to any file.  Even  a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE  and a page is modified, the file page is replaced by a private anonymous copy. +  "LazyFree" shows the amount of memory which is marked by madvise(MADV_FREE).  The memory isn't freed immediately with madvise(). It's freed in memory  pressure if the memory is clean. Please note that the printed value might  be lower than the real value due to optimizations used in the current  implementation. If this is not desirable please file a bug report. +  "AnonHugePages" shows the ammount of memory backed by transparent hugepage. +  "ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by  huge pages. +  "Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by  hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical  reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field. +  "Swap" shows how much would-be-anonymous memory is also used, but out on swap. +  For shmem mappings, "Swap" includes also the size of the mapped (and not  replaced by copy-on-write) part of the underlying shmem object out on swap.  "SwapPss" shows proportional swap share of this mapping. Unlike "Swap", this @@ -489,36 +511,39 @@ does not take into account swapped out page of underlying shmem objects.  "THPeligible" indicates whether the mapping is eligible for allocating THP  pages - 1 if true, 0 otherwise. It just shows the current status. -"VmFlags" field deserves a separate description. This member represents the kernel -flags associated with the particular virtual memory area in two letter encoded -manner. The codes are the following: -    rd  - readable -    wr  - writeable -    ex  - executable -    sh  - shared -    mr  - may read -    mw  - may write -    me  - may execute -    ms  - may share -    gd  - stack segment growns down -    pf  - pure PFN range -    dw  - disabled write to the mapped file -    lo  - pages are locked in memory -    io  - memory mapped I/O area -    sr  - sequential read advise provided -    rr  - random read advise provided -    dc  - do not copy area on fork -    de  - do not expand area on remapping -    ac  - area is accountable -    nr  - swap space is not reserved for the area -    ht  - area uses huge tlb pages -    ar  - architecture specific flag -    dd  - do not include area into core dump -    sd  - soft-dirty flag -    mm  - mixed map area -    hg  - huge page advise flag -    nh  - no-huge page advise flag -    mg  - mergable advise flag +"VmFlags" field deserves a separate description. This member represents the +kernel flags associated with the particular virtual memory area in two letter +encoded manner. The codes are the following: + +    ==    ======================================= +    rd    readable +    wr    writeable +    ex    executable +    sh    shared +    mr    may read +    mw    may write +    me    may execute +    ms    may share +    gd    stack segment growns down +    pf    pure PFN range +    dw    disabled write to the mapped file +    lo    pages are locked in memory +    io    memory mapped I/O area +    sr    sequential read advise provided +    rr    random read advise provided +    dc    do not copy area on fork +    de    do not expand area on remapping +    ac    area is accountable +    nr    swap space is not reserved for the area +    ht    area uses huge tlb pages +    ar    architecture specific flag +    dd    do not include area into core dump +    sd    soft dirty flag +    mm    mixed map area +    hg    huge page advise flag +    nh    no huge page advise flag +    mg    mergable advise flag +    ==    =======================================  Note that there is no guarantee that every flag and associated mnemonic will  be present in all further kernel releases. Things get changed, the flags may @@ -531,6 +556,7 @@ enabled.  Note: reading /proc/PID/maps or /proc/PID/smaps is inherently racy (consistent  output can be achieved only in the single read call). +  This typically manifests when doing partial reads of these files while the  memory map is being modified.  Despite the races, we do provide the following  guarantees: @@ -544,9 +570,9 @@ The /proc/PID/smaps_rollup file includes the same fields as /proc/PID/smaps,  but their values are the sums of the corresponding values for all mappings of  the process.  Additionally, it contains these fields: -Pss_Anon -Pss_File -Pss_Shmem +- Pss_Anon +- Pss_File +- Pss_Shmem  They represent the proportional shares of anonymous, file, and shmem pages, as  described for smaps above.  These fields are omitted in smaps since each @@ -558,20 +584,25 @@ The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG  bits on both physical and virtual pages associated with a process, and the  soft-dirty bit on pte (see Documentation/admin-guide/mm/soft-dirty.rst  for details). -To clear the bits for all the pages associated with the process +To clear the bits for all the pages associated with the process:: +      > echo 1 > /proc/PID/clear_refs -To clear the bits for the anonymous pages associated with the process +To clear the bits for the anonymous pages associated with the process:: +      > echo 2 > /proc/PID/clear_refs -To clear the bits for the file mapped pages associated with the process +To clear the bits for the file mapped pages associated with the process:: +      > echo 3 > /proc/PID/clear_refs -To clear the soft-dirty bit +To clear the soft-dirty bit:: +      > echo 4 > /proc/PID/clear_refs  To reset the peak resident set size ("high water mark") to the process's -current value: +current value:: +      > echo 5 > /proc/PID/clear_refs  Any other value written to /proc/PID/clear_refs will have no effect. @@ -584,30 +615,33 @@ Documentation/admin-guide/mm/pagemap.rst.  The /proc/pid/numa_maps is an extension based on maps, showing the memory  locality and binding policy, as well as the memory usage (in pages) of  each mapping. The output follows a general format where mapping details get -summarized separated by blank spaces, one mapping per each file line: - -address   policy    mapping details - -00400000 default file=/usr/local/bin/app mapped=1 active=0 N3=1 kernelpagesize_kB=4 -00600000 default file=/usr/local/bin/app anon=1 dirty=1 N3=1 kernelpagesize_kB=4 -3206000000 default file=/lib64/ld-2.12.so mapped=26 mapmax=6 N0=24 N3=2 kernelpagesize_kB=4 -320621f000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4 -3206220000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4 -3206221000 default anon=1 dirty=1 N3=1 kernelpagesize_kB=4 -3206800000 default file=/lib64/libc-2.12.so mapped=59 mapmax=21 active=55 N0=41 N3=18 kernelpagesize_kB=4 -320698b000 default file=/lib64/libc-2.12.so -3206b8a000 default file=/lib64/libc-2.12.so anon=2 dirty=2 N3=2 kernelpagesize_kB=4 -3206b8e000 default file=/lib64/libc-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4 -3206b8f000 default anon=3 dirty=3 active=1 N3=3 kernelpagesize_kB=4 -7f4dc10a2000 default anon=3 dirty=3 N3=3 kernelpagesize_kB=4 -7f4dc10b4000 default anon=2 dirty=2 active=1 N3=2 kernelpagesize_kB=4 -7f4dc1200000 default file=/anon_hugepage\040(deleted) huge anon=1 dirty=1 N3=1 kernelpagesize_kB=2048 -7fff335f0000 default stack anon=3 dirty=3 N3=3 kernelpagesize_kB=4 -7fff3369d000 default mapped=1 mapmax=35 active=0 N3=1 kernelpagesize_kB=4 +summarized separated by blank spaces, one mapping per each file line:: + +    address   policy    mapping details + +    00400000 default file=/usr/local/bin/app mapped=1 active=0 N3=1 kernelpagesize_kB=4 +    00600000 default file=/usr/local/bin/app anon=1 dirty=1 N3=1 kernelpagesize_kB=4 +    3206000000 default file=/lib64/ld-2.12.so mapped=26 mapmax=6 N0=24 N3=2 kernelpagesize_kB=4 +    320621f000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4 +    3206220000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4 +    3206221000 default anon=1 dirty=1 N3=1 kernelpagesize_kB=4 +    3206800000 default file=/lib64/libc-2.12.so mapped=59 mapmax=21 active=55 N0=41 N3=18 kernelpagesize_kB=4 +    320698b000 default file=/lib64/libc-2.12.so +    3206b8a000 default file=/lib64/libc-2.12.so anon=2 dirty=2 N3=2 kernelpagesize_kB=4 +    3206b8e000 default file=/lib64/libc-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4 +    3206b8f000 default anon=3 dirty=3 active=1 N3=3 kernelpagesize_kB=4 +    7f4dc10a2000 default anon=3 dirty=3 N3=3 kernelpagesize_kB=4 +    7f4dc10b4000 default anon=2 dirty=2 active=1 N3=2 kernelpagesize_kB=4 +    7f4dc1200000 default file=/anon_hugepage\040(deleted) huge anon=1 dirty=1 N3=1 kernelpagesize_kB=2048 +    7fff335f0000 default stack anon=3 dirty=3 N3=3 kernelpagesize_kB=4 +    7fff3369d000 default mapped=1 mapmax=35 active=0 N3=1 kernelpagesize_kB=4  Where: +  "address" is the starting address for the mapping; +  "policy" reports the NUMA memory policy set for the mapping (see Documentation/admin-guide/mm/numa_memory_policy.rst); +  "mapping details" summarizes mapping data such as mapping type, page usage counters,  node locality page counters (N0 == node0, N1 == node1, ...) and the kernel page  size, in KB, that is backing the mapping up. @@ -621,81 +655,83 @@ the running kernel. The files used to obtain this information are contained in  system. It  depends  on the kernel configuration and the loaded modules, which  files are there, and which are missing. -Table 1-5: Kernel info in /proc -.............................................................................. - File        Content                                            - apm         Advanced power management info                     - buddyinfo   Kernel memory allocator information (see text)	(2.5) - bus         Directory containing bus specific information      - cmdline     Kernel command line                                - cpuinfo     Info about the CPU                                 - devices     Available devices (block and character)            - dma         Used DMS channels                                  - filesystems Supported filesystems                              - driver	     Various drivers grouped here, currently rtc (2.4) - execdomains Execdomains, related to security			(2.4) - fb	     Frame Buffer devices				(2.4) - fs	     File system parameters, currently nfs/exports	(2.4) - ide         Directory containing info about the IDE subsystem  - interrupts  Interrupt usage                                    - iomem	     Memory map						(2.4) - ioports     I/O port usage                                     - irq	     Masks for irq to cpu affinity			(2.4)(smp?) - isapnp	     ISA PnP (Plug&Play) Info				(2.4) - kcore       Kernel core image (can be ELF or A.OUT(deprecated in 2.4))    - kmsg        Kernel messages                                    - ksyms       Kernel symbol table                                - loadavg     Load average of last 1, 5 & 15 minutes                 - locks       Kernel locks                                       - meminfo     Memory info                                        - misc        Miscellaneous                                      - modules     List of loaded modules                             - mounts      Mounted filesystems                                - net         Networking info (see text)                         +.. table:: Table 1-5: Kernel info in /proc + + ============ =============================================================== + File         Content + ============ =============================================================== + apm          Advanced power management info + buddyinfo    Kernel memory allocator information (see text)	(2.5) + bus          Directory containing bus specific information + cmdline      Kernel command line + cpuinfo      Info about the CPU + devices      Available devices (block and character) + dma          Used DMS channels + filesystems  Supported filesystems + driver       Various drivers grouped here, currently rtc	(2.4) + execdomains  Execdomains, related to security			(2.4) + fb 	      Frame Buffer devices				(2.4) + fs 	      File system parameters, currently nfs/exports	(2.4) + ide          Directory containing info about the IDE subsystem + interrupts   Interrupt usage + iomem 	      Memory map					(2.4) + ioports      I/O port usage + irq 	      Masks for irq to cpu affinity			(2.4)(smp?) + isapnp       ISA PnP (Plug&Play) Info				(2.4) + kcore        Kernel core image (can be ELF or A.OUT(deprecated in 2.4)) + kmsg         Kernel messages + ksyms        Kernel symbol table + loadavg      Load average of last 1, 5 & 15 minutes + locks        Kernel locks + meminfo      Memory info + misc         Miscellaneous + modules      List of loaded modules + mounts       Mounted filesystems + net          Networking info (see text)   pagetypeinfo Additional page allocator information (see text)  (2.5) - partitions  Table of partitions known to the system            - pci	     Deprecated info of PCI bus (new way -> /proc/bus/pci/, -             decoupled by lspci					(2.4) - rtc         Real time clock                                    - scsi        SCSI info (see text)                               - slabinfo    Slab pool info                                     - softirqs    softirq usage - stat        Overall statistics                                 - swaps       Swap space utilization                             - sys         See chapter 2                                      - sysvipc     Info of SysVIPC Resources (msg, sem, shm)		(2.4) - tty	     Info of tty drivers - uptime      Wall clock since boot, combined idle time of all cpus - version     Kernel version                                     - video	     bttv info of video resources			(2.4) - vmallocinfo Show vmalloced areas -.............................................................................. + partitions   Table of partitions known to the system + pci 	      Deprecated info of PCI bus (new way -> /proc/bus/pci/, +              decoupled by lspci				(2.4) + rtc          Real time clock + scsi         SCSI info (see text) + slabinfo     Slab pool info + softirqs     softirq usage + stat         Overall statistics + swaps        Swap space utilization + sys          See chapter 2 + sysvipc      Info of SysVIPC Resources (msg, sem, shm)		(2.4) + tty 	      Info of tty drivers + uptime       Wall clock since boot, combined idle time of all cpus + version      Kernel version + video 	      bttv info of video resources			(2.4) + vmallocinfo  Show vmalloced areas + ============ ===============================================================  You can,  for  example,  check  which interrupts are currently in use and what -they are used for by looking in the file /proc/interrupts: - -  > cat /proc/interrupts  -             CPU0         -    0:    8728810          XT-PIC  timer  -    1:        895          XT-PIC  keyboard  -    2:          0          XT-PIC  cascade  -    3:     531695          XT-PIC  aha152x  -    4:    2014133          XT-PIC  serial  -    5:      44401          XT-PIC  pcnet_cs  -    8:          2          XT-PIC  rtc  -   11:          8          XT-PIC  i82365  -   12:     182918          XT-PIC  PS/2 Mouse  -   13:          1          XT-PIC  fpu  -   14:    1232265          XT-PIC  ide0  -   15:          7          XT-PIC  ide1  -  NMI:          0  +they are used for by looking in the file /proc/interrupts:: + +  > cat /proc/interrupts +             CPU0 +    0:    8728810          XT-PIC  timer +    1:        895          XT-PIC  keyboard +    2:          0          XT-PIC  cascade +    3:     531695          XT-PIC  aha152x +    4:    2014133          XT-PIC  serial +    5:      44401          XT-PIC  pcnet_cs +    8:          2          XT-PIC  rtc +   11:          8          XT-PIC  i82365 +   12:     182918          XT-PIC  PS/2 Mouse +   13:          1          XT-PIC  fpu +   14:    1232265          XT-PIC  ide0 +   15:          7          XT-PIC  ide1 +  NMI:          0  In 2.4.* a couple of lines where added to this file LOC & ERR (this time is the -output of a SMP machine): +output of a SMP machine):: -  > cat /proc/interrupts  +  > cat /proc/interrupts -             CPU0       CPU1        +             CPU0       CPU1      0:    1243498    1214548    IO-APIC-edge  timer      1:       8949       8958    IO-APIC-edge  keyboard      2:          0          0          XT-PIC  cascade @@ -708,8 +744,8 @@ output of a SMP machine):     15:       2183       2415    IO-APIC-edge  ide1     17:      30564      30414   IO-APIC-level  eth0     18:        177        164   IO-APIC-level  bttv -  NMI:    2457961    2457959  -  LOC:    2457882    2457881  +  NMI:    2457961    2457959 +  LOC:    2457882    2457881    ERR:       2155  NMI is incremented in this case because every timer interrupt generates a NMI @@ -726,21 +762,25 @@ In 2.6.2* /proc/interrupts was expanded again.  This time the goal was for  /proc/interrupts to display every IRQ vector in use by the system, not  just those considered 'most important'.  The new vectors are: -  THR -- interrupt raised when a machine check threshold counter +THR +  interrupt raised when a machine check threshold counter    (typically counting ECC corrected errors of memory or cache) exceeds    a configurable threshold.  Only available on some systems. -  TRM -- a thermal event interrupt occurs when a temperature threshold +TRM +  a thermal event interrupt occurs when a temperature threshold    has been exceeded for the CPU.  This interrupt may also be generated    when the temperature drops back to normal. -  SPU -- a spurious interrupt is some interrupt that was raised then lowered +SPU +  a spurious interrupt is some interrupt that was raised then lowered    by some IO device before it could be fully processed by the APIC.  Hence    the APIC sees the interrupt but does not know what device it came from.    For this case the APIC will generate the interrupt with a IRQ vector    of 0xff. This might also be generated by chipset bugs. -  RES, CAL, TLB -- rescheduling, call and TLB flush interrupts are +RES, CAL, TLB] +  rescheduling, call and TLB flush interrupts are    sent from one CPU to another per the needs of the OS.  Typically,    their statistics are used by kernel developers and interested users to    determine the occurrence of interrupts of the given type. @@ -756,7 +796,8 @@ IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the  irq subdir is one subdir for each IRQ, and two files; default_smp_affinity and  prof_cpu_mask. -For example  +For example:: +    > ls /proc/irq/    0  10  12  14  16  18  2  4  6  8  prof_cpu_mask    1  11  13  15  17  19  3  5  7  9  default_smp_affinity @@ -764,20 +805,20 @@ For example    smp_affinity  smp_affinity is a bitmask, in which you can specify which CPUs can handle the -IRQ, you can set it by doing: +IRQ, you can set it by doing::    > echo 1 > /proc/irq/10/smp_affinity  This means that only the first CPU will handle the IRQ, but you can also echo  5 which means that only the first and third CPU can handle the IRQ. -The contents of each smp_affinity file is the same by default: +The contents of each smp_affinity file is the same by default::    > cat /proc/irq/0/smp_affinity    ffffffff  There is an alternate interface, smp_affinity_list which allows specifying -a cpu range instead of a bitmask: +a cpu range instead of a bitmask::    > cat /proc/irq/0/smp_affinity_list    1024-1031 @@ -810,46 +851,46 @@ Linux uses  slab  pools for memory management above page level in version 2.2.  Commonly used  objects  have  their  own  slab  pool (such as network buffers,  directory cache, and so on). -.............................................................................. +:: -> cat /proc/buddyinfo +    > cat /proc/buddyinfo -Node 0, zone      DMA      0      4      5      4      4      3 ... -Node 0, zone   Normal      1      0      0      1    101      8 ... -Node 0, zone  HighMem      2      0      0      1      1      0 ... +    Node 0, zone      DMA      0      4      5      4      4      3 ... +    Node 0, zone   Normal      1      0      0      1    101      8 ... +    Node 0, zone  HighMem      2      0      0      1      1      0 ...  External fragmentation is a problem under some workloads, and buddyinfo is a -useful tool for helping diagnose these problems.  Buddyinfo will give you a  +useful tool for helping diagnose these problems.  Buddyinfo will give you a  clue as to how big an area you can safely allocate, or why a previous  allocation failed. -Each column represents the number of pages of a certain order which are  -available.  In this case, there are 0 chunks of 2^0*PAGE_SIZE available in  -ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE  -available in ZONE_NORMAL, etc...  +Each column represents the number of pages of a certain order which are +available.  In this case, there are 0 chunks of 2^0*PAGE_SIZE available in +ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE +available in ZONE_NORMAL, etc...  More information relevant to external fragmentation can be found in -pagetypeinfo. - -> cat /proc/pagetypeinfo -Page block order: 9 -Pages per block:  512 - -Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10 -Node    0, zone      DMA, type    Unmovable      0      0      0      1      1      1      1      1      1      1      0 -Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0 -Node    0, zone      DMA, type      Movable      1      1      2      1      2      1      1      0      1      0      2 -Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      1      0 -Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 -Node    0, zone    DMA32, type    Unmovable    103     54     77      1      1      1     11      8      7      1      9 -Node    0, zone    DMA32, type  Reclaimable      0      0      2      1      0      0      0      0      1      0      0 -Node    0, zone    DMA32, type      Movable    169    152    113     91     77     54     39     13      6      1    452 -Node    0, zone    DMA32, type      Reserve      1      2      2      2      2      0      1      1      1      1      0 -Node    0, zone    DMA32, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 - -Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate -Node 0, zone      DMA            2            0            5            1            0 -Node 0, zone    DMA32           41            6          967            2            0 +pagetypeinfo:: + +    > cat /proc/pagetypeinfo +    Page block order: 9 +    Pages per block:  512 + +    Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10 +    Node    0, zone      DMA, type    Unmovable      0      0      0      1      1      1      1      1      1      1      0 +    Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0 +    Node    0, zone      DMA, type      Movable      1      1      2      1      2      1      1      0      1      0      2 +    Node    0, zone      DMA, type      Reserve      0      0      0      0      0      0      0      0      0      1      0 +    Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 +    Node    0, zone    DMA32, type    Unmovable    103     54     77      1      1      1     11      8      7      1      9 +    Node    0, zone    DMA32, type  Reclaimable      0      0      2      1      0      0      0      0      1      0      0 +    Node    0, zone    DMA32, type      Movable    169    152    113     91     77     54     39     13      6      1    452 +    Node    0, zone    DMA32, type      Reserve      1      2      2      2      2      0      1      1      1      1      0 +    Node    0, zone    DMA32, type      Isolate      0      0      0      0      0      0      0      0      0      0      0 + +    Number of blocks type     Unmovable  Reclaimable      Movable      Reserve      Isolate +    Node 0, zone      DMA            2            0            5            1            0 +    Node 0, zone    DMA32           41            6          967            2            0  Fragmentation avoidance in the kernel works by grouping pages of different  migrate types into the same contiguous regions of memory called page blocks. @@ -870,59 +911,63 @@ unless memory has been mlock()'d. Some of the Reclaimable blocks should  also be allocatable although a lot of filesystem metadata may have to be  reclaimed to achieve this. -.............................................................................. -meminfo: +meminfo +~~~~~~~  Provides information about distribution and utilization of memory.  This  varies by architecture and compile options.  The following is from a  16GB PIII, which has highmem enabled.  You may not have all of these fields. -> cat /proc/meminfo - -MemTotal:     16344972 kB -MemFree:      13634064 kB -MemAvailable: 14836172 kB -Buffers:          3656 kB -Cached:        1195708 kB -SwapCached:          0 kB -Active:         891636 kB -Inactive:      1077224 kB -HighTotal:    15597528 kB -HighFree:     13629632 kB -LowTotal:       747444 kB -LowFree:          4432 kB -SwapTotal:           0 kB -SwapFree:            0 kB -Dirty:             968 kB -Writeback:           0 kB -AnonPages:      861800 kB -Mapped:         280372 kB -Shmem:             644 kB -KReclaimable:   168048 kB -Slab:           284364 kB -SReclaimable:   159856 kB -SUnreclaim:     124508 kB -PageTables:      24448 kB -NFS_Unstable:        0 kB -Bounce:              0 kB -WritebackTmp:        0 kB -CommitLimit:   7669796 kB -Committed_AS:   100056 kB -VmallocTotal:   112216 kB -VmallocUsed:       428 kB -VmallocChunk:   111088 kB -Percpu:          62080 kB -HardwareCorrupted:   0 kB -AnonHugePages:   49152 kB -ShmemHugePages:      0 kB -ShmemPmdMapped:      0 kB - - -    MemTotal: Total usable ram (i.e. physical ram minus a few reserved +:: + +    > cat /proc/meminfo + +    MemTotal:     16344972 kB +    MemFree:      13634064 kB +    MemAvailable: 14836172 kB +    Buffers:          3656 kB +    Cached:        1195708 kB +    SwapCached:          0 kB +    Active:         891636 kB +    Inactive:      1077224 kB +    HighTotal:    15597528 kB +    HighFree:     13629632 kB +    LowTotal:       747444 kB +    LowFree:          4432 kB +    SwapTotal:           0 kB +    SwapFree:            0 kB +    Dirty:             968 kB +    Writeback:           0 kB +    AnonPages:      861800 kB +    Mapped:         280372 kB +    Shmem:             644 kB +    KReclaimable:   168048 kB +    Slab:           284364 kB +    SReclaimable:   159856 kB +    SUnreclaim:     124508 kB +    PageTables:      24448 kB +    NFS_Unstable:        0 kB +    Bounce:              0 kB +    WritebackTmp:        0 kB +    CommitLimit:   7669796 kB +    Committed_AS:   100056 kB +    VmallocTotal:   112216 kB +    VmallocUsed:       428 kB +    VmallocChunk:   111088 kB +    Percpu:          62080 kB +    HardwareCorrupted:   0 kB +    AnonHugePages:   49152 kB +    ShmemHugePages:      0 kB +    ShmemPmdMapped:      0 kB + +MemTotal +              Total usable ram (i.e. physical ram minus a few reserved                bits and the kernel binary code) -     MemFree: The sum of LowFree+HighFree -MemAvailable: An estimate of how much memory is available for starting new +MemFree +              The sum of LowFree+HighFree +MemAvailable +              An estimate of how much memory is available for starting new                applications, without swapping. Calculated from MemFree,                SReclaimable, the size of the file LRU lists, and the low                watermarks in each zone. @@ -930,69 +975,99 @@ MemAvailable: An estimate of how much memory is available for starting new                page cache to function well, and that not all reclaimable                slab will be reclaimable, due to items being in use. The                impact of those factors will vary from system to system. -     Buffers: Relatively temporary storage for raw disk blocks +Buffers +              Relatively temporary storage for raw disk blocks                shouldn't get tremendously large (20MB or so) -      Cached: in-memory cache for files read from the disk (the +Cached +              in-memory cache for files read from the disk (the                pagecache).  Doesn't include SwapCached -  SwapCached: Memory that once was swapped out, is swapped back in but +SwapCached +              Memory that once was swapped out, is swapped back in but                still also is in the swapfile (if memory is needed it                doesn't need to be swapped out AGAIN because it is already                in the swapfile. This saves I/O) -      Active: Memory that has been used more recently and usually not +Active +              Memory that has been used more recently and usually not                reclaimed unless absolutely necessary. -    Inactive: Memory which has been less recently used.  It is more +Inactive +              Memory which has been less recently used.  It is more                eligible to be reclaimed for other purposes -   HighTotal: -    HighFree: Highmem is all memory above ~860MB of physical memory +HighTotal, HighFree +              Highmem is all memory above ~860MB of physical memory                Highmem areas are for use by userspace programs, or                for the pagecache.  The kernel must use tricks to access                this memory, making it slower to access than lowmem. -    LowTotal: -     LowFree: Lowmem is memory which can be used for everything that +LowTotal, LowFree +              Lowmem is memory which can be used for everything that                highmem can be used for, but it is also available for the                kernel's use for its own data structures.  Among many                other things, it is where everything from the Slab is                allocated.  Bad things happen when you're out of lowmem. -   SwapTotal: total amount of swap space available -    SwapFree: Memory which has been evicted from RAM, and is temporarily +SwapTotal +              total amount of swap space available +SwapFree +              Memory which has been evicted from RAM, and is temporarily                on the disk -       Dirty: Memory which is waiting to get written back to the disk -   Writeback: Memory which is actively being written back to the disk -   AnonPages: Non-file backed pages mapped into userspace page tables -HardwareCorrupted: The amount of RAM/memory in KB, the kernel identifies as +Dirty +              Memory which is waiting to get written back to the disk +Writeback +              Memory which is actively being written back to the disk +AnonPages +              Non-file backed pages mapped into userspace page tables +HardwareCorrupted +              The amount of RAM/memory in KB, the kernel identifies as  	      corrupted. -AnonHugePages: Non-file backed huge pages mapped into userspace page tables -      Mapped: files which have been mmaped, such as libraries -       Shmem: Total memory used by shared memory (shmem) and tmpfs -ShmemHugePages: Memory used by shared memory (shmem) and tmpfs allocated +AnonHugePages +              Non-file backed huge pages mapped into userspace page tables +Mapped +              files which have been mmaped, such as libraries +Shmem +              Total memory used by shared memory (shmem) and tmpfs +ShmemHugePages +              Memory used by shared memory (shmem) and tmpfs allocated                with huge pages -ShmemPmdMapped: Shared memory mapped into userspace with huge pages -KReclaimable: Kernel allocations that the kernel will attempt to reclaim +ShmemPmdMapped +              Shared memory mapped into userspace with huge pages +KReclaimable +              Kernel allocations that the kernel will attempt to reclaim                under memory pressure. Includes SReclaimable (below), and other                direct allocations with a shrinker. -        Slab: in-kernel data structures cache -SReclaimable: Part of Slab, that might be reclaimed, such as caches -  SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure -  PageTables: amount of memory dedicated to the lowest level of page +Slab +              in-kernel data structures cache +SReclaimable +              Part of Slab, that might be reclaimed, such as caches +SUnreclaim +              Part of Slab, that cannot be reclaimed on memory pressure +PageTables +              amount of memory dedicated to the lowest level of page                tables. -NFS_Unstable: NFS pages sent to the server, but not yet committed to stable +NFS_Unstable +              NFS pages sent to the server, but not yet committed to stable  	      storage -      Bounce: Memory used for block device "bounce buffers" -WritebackTmp: Memory used by FUSE for temporary writeback buffers - CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'), +Bounce +              Memory used for block device "bounce buffers" +WritebackTmp +              Memory used by FUSE for temporary writeback buffers +CommitLimit +              Based on the overcommit ratio ('vm.overcommit_ratio'),                this is the total amount of  memory currently available to                be allocated on the system. This limit is only adhered to                if strict overcommit accounting is enabled (mode 2 in                'vm.overcommit_memory'). -              The CommitLimit is calculated with the following formula: -              CommitLimit = ([total RAM pages] - [total huge TLB pages]) * -                             overcommit_ratio / 100 + [total swap pages] + +              The CommitLimit is calculated with the following formula:: + +                CommitLimit = ([total RAM pages] - [total huge TLB pages]) * +                               overcommit_ratio / 100 + [total swap pages] +                For example, on a system with 1G of physical RAM and 7G                of swap with a `vm.overcommit_ratio` of 30 it would                yield a CommitLimit of 7.3G. +                For more details, see the memory overcommit documentation                in vm/overcommit-accounting. -Committed_AS: The amount of memory presently allocated on the system. +Committed_AS +              The amount of memory presently allocated on the system.                The committed memory is a sum of all of the memory which                has been allocated by processes, even if it has not been                "used" by them as of yet. A process which malloc()'s 1G @@ -1005,21 +1080,25 @@ Committed_AS: The amount of memory presently allocated on the system.                This is useful if one needs to guarantee that processes will                not fail due to lack of memory once that memory has been                successfully allocated. -VmallocTotal: total size of vmalloc memory area - VmallocUsed: amount of vmalloc area which is used -VmallocChunk: largest contiguous block of vmalloc area which is free -      Percpu: Memory allocated to the percpu allocator used to back percpu +VmallocTotal +              total size of vmalloc memory area +VmallocUsed +              amount of vmalloc area which is used +VmallocChunk +              largest contiguous block of vmalloc area which is free +Percpu +              Memory allocated to the percpu allocator used to back percpu                allocations. This stat excludes the cost of metadata. -.............................................................................. - -vmallocinfo: +vmallocinfo +~~~~~~~~~~~  Provides information about vmalloced/vmaped areas. One line per area,  containing the virtual address range of the area, size in bytes,  caller information of the creator, and optional information depending  on the kind of area : + ==========  ===================================================   pages=nr    number of pages   phys=addr   if a physical address was specified   ioremap     I/O mapping (ioremap() and friends) @@ -1029,49 +1108,54 @@ on the kind of area :   vpages      buffer for pages pointers was vmalloced (huge area)   N<node>=nr  (Only on NUMA kernels)               Number of pages allocated on memory node <node> - -> cat /proc/vmallocinfo -0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204 ... -  /0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128 -0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204 ... -  /0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64 -0xffffc20000302000-0xffffc20000304000    8192 acpi_tb_verify_table+0x21/0x4f... -  phys=7fee8000 ioremap -0xffffc20000304000-0xffffc20000307000   12288 acpi_tb_verify_table+0x21/0x4f... -  phys=7fee7000 ioremap -0xffffc2000031d000-0xffffc2000031f000    8192 init_vdso_vars+0x112/0x210 -0xffffc2000031f000-0xffffc2000032b000   49152 cramfs_uncompress_init+0x2e ... -  /0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3 -0xffffc2000033a000-0xffffc2000033d000   12288 sys_swapon+0x640/0xac0      ... -  pages=2 vmalloc N1=2 -0xffffc20000347000-0xffffc2000034c000   20480 xt_alloc_table_info+0xfe ... -  /0x130 [x_tables] pages=4 vmalloc N0=4 -0xffffffffa0000000-0xffffffffa000f000   61440 sys_init_module+0xc27/0x1d00 ... -   pages=14 vmalloc N2=14 -0xffffffffa000f000-0xffffffffa0014000   20480 sys_init_module+0xc27/0x1d00 ... -   pages=4 vmalloc N1=4 -0xffffffffa0014000-0xffffffffa0017000   12288 sys_init_module+0xc27/0x1d00 ... -   pages=2 vmalloc N1=2 -0xffffffffa0017000-0xffffffffa0022000   45056 sys_init_module+0xc27/0x1d00 ... -   pages=10 vmalloc N0=10 - -.............................................................................. - -softirqs: + ==========  =================================================== + +:: + +    > cat /proc/vmallocinfo +    0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204 ... +    /0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128 +    0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204 ... +    /0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64 +    0xffffc20000302000-0xffffc20000304000    8192 acpi_tb_verify_table+0x21/0x4f... +    phys=7fee8000 ioremap +    0xffffc20000304000-0xffffc20000307000   12288 acpi_tb_verify_table+0x21/0x4f... +    phys=7fee7000 ioremap +    0xffffc2000031d000-0xffffc2000031f000    8192 init_vdso_vars+0x112/0x210 +    0xffffc2000031f000-0xffffc2000032b000   49152 cramfs_uncompress_init+0x2e ... +    /0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3 +    0xffffc2000033a000-0xffffc2000033d000   12288 sys_swapon+0x640/0xac0      ... +    pages=2 vmalloc N1=2 +    0xffffc20000347000-0xffffc2000034c000   20480 xt_alloc_table_info+0xfe ... +    /0x130 [x_tables] pages=4 vmalloc N0=4 +    0xffffffffa0000000-0xffffffffa000f000   61440 sys_init_module+0xc27/0x1d00 ... +    pages=14 vmalloc N2=14 +    0xffffffffa000f000-0xffffffffa0014000   20480 sys_init_module+0xc27/0x1d00 ... +    pages=4 vmalloc N1=4 +    0xffffffffa0014000-0xffffffffa0017000   12288 sys_init_module+0xc27/0x1d00 ... +    pages=2 vmalloc N1=2 +    0xffffffffa0017000-0xffffffffa0022000   45056 sys_init_module+0xc27/0x1d00 ... +    pages=10 vmalloc N0=10 + + +softirqs +~~~~~~~~  Provides counts of softirq handlers serviced since boot time, for each cpu. -> cat /proc/softirqs -                CPU0       CPU1       CPU2       CPU3 -      HI:          0          0          0          0 -   TIMER:      27166      27120      27097      27034 -  NET_TX:          0          0          0         17 -  NET_RX:         42          0          0         39 -   BLOCK:          0          0        107       1121 - TASKLET:          0          0          0        290 -   SCHED:      27035      26983      26971      26746 - HRTIMER:          0          0          0          0 -     RCU:       1678       1769       2178       2250 +:: + +    > cat /proc/softirqs +		    CPU0       CPU1       CPU2       CPU3 +	HI:          0          0          0          0 +    TIMER:      27166      27120      27097      27034 +    NET_TX:          0          0          0         17 +    NET_RX:         42          0          0         39 +    BLOCK:          0          0        107       1121 +    TASKLET:          0          0          0        290 +    SCHED:      27035      26983      26971      26746 +    HRTIMER:          0          0          0          0 +	RCU:       1678       1769       2178       2250  1.3 IDE devices in /proc/ide @@ -1083,7 +1167,7 @@ file drivers  and a link for each IDE device, pointing to the device directory  in the controller specific subtree.  The file  drivers  contains general information about the drivers used for the -IDE devices: +IDE devices::    > cat /proc/ide/drivers    ide-cdrom version 4.53 @@ -1094,57 +1178,61 @@ subdirectories. These  are  named  ide0,  ide1  and  so  on.  Each  of  these  directories contains the files shown in table 1-6. -Table 1-6: IDE controller info in  /proc/ide/ide? -.............................................................................. - File    Content                                  - channel IDE channel (0 or 1)                     - config  Configuration (only for PCI/IDE bridge)  - mate    Mate name                                - model   Type/Chipset of IDE controller           -.............................................................................. +.. table:: Table 1-6: IDE controller info in  /proc/ide/ide? + + ======= ======================================= + File    Content + ======= ======================================= + channel IDE channel (0 or 1) + config  Configuration (only for PCI/IDE bridge) + mate    Mate name + model   Type/Chipset of IDE controller + ======= =======================================  Each device  connected  to  a  controller  has  a separate subdirectory in the  controllers directory.  The  files  listed in table 1-7 are contained in these  directories. -Table 1-7: IDE device information -.............................................................................. - File             Content                                     - cache            The cache                                   - capacity         Capacity of the medium (in 512Byte blocks)  - driver           driver and version                          - geometry         physical and logical geometry               - identify         device identify block                       - media            media type                                  - model            device identifier                           - settings         device setup                                - smart_thresholds IDE disk management thresholds              - smart_values     IDE disk management values                  -.............................................................................. - -The most  interesting  file is settings. This file contains a nice overview of -the drive parameters: - -  # cat /proc/ide/ide0/hda/settings  -  name                    value           min             max             mode  -  ----                    -----           ---             ---             ----  -  bios_cyl                526             0               65535           rw  -  bios_head               255             0               255             rw  -  bios_sect               63              0               63              rw  -  breada_readahead        4               0               127             rw  -  bswap                   0               0               1               r  -  file_readahead          72              0               2097151         rw  -  io_32bit                0               0               3               rw  -  keepsettings            0               0               1               rw  -  max_kb_per_request      122             1               127             rw  -  multcount               0               0               8               rw  -  nice1                   1               0               1               rw  -  nowerr                  0               0               1               rw  -  pio_mode                write-only      0               255             w  -  slow                    0               0               1               rw  -  unmaskirq               0               0               1               rw  -  using_dma               0               0               1               rw  +.. table:: Table 1-7: IDE device information + + ================ ========================================== + File             Content + ================ ========================================== + cache            The cache + capacity         Capacity of the medium (in 512Byte blocks) + driver           driver and version + geometry         physical and logical geometry + identify         device identify block + media            media type + model            device identifier + settings         device setup + smart_thresholds IDE disk management thresholds + smart_values     IDE disk management values + ================ ========================================== + +The most  interesting  file is ``settings``. This file contains a nice +overview of the drive parameters:: + +  # cat /proc/ide/ide0/hda/settings +  name                    value           min             max             mode +  ----                    -----           ---             ---             ---- +  bios_cyl                526             0               65535           rw +  bios_head               255             0               255             rw +  bios_sect               63              0               63              rw +  breada_readahead        4               0               127             rw +  bswap                   0               0               1               r +  file_readahead          72              0               2097151         rw +  io_32bit                0               0               3               rw +  keepsettings            0               0               1               rw +  max_kb_per_request      122             1               127             rw +  multcount               0               0               8               rw +  nice1                   1               0               1               rw +  nowerr                  0               0               1               rw +  pio_mode                write-only      0               255             w +  slow                    0               0               1               rw +  unmaskirq               0               0               1               rw +  using_dma               0               0               1               rw  1.4 Networking info in /proc/net @@ -1155,67 +1243,70 @@ additional values  you  get  for  IP  version 6 if you configure the kernel to  support this. Table 1-9 lists the files and their meaning. -Table 1-8: IPv6 info in /proc/net -.............................................................................. - File       Content                                                - udp6       UDP sockets (IPv6)                                     - tcp6       TCP sockets (IPv6)                                     - raw6       Raw device statistics (IPv6)                           - igmp6      IP multicast addresses, which this host joined (IPv6)  - if_inet6   List of IPv6 interface addresses                       - ipv6_route Kernel routing table for IPv6                          - rt6_stats  Global IPv6 routing tables statistics                  - sockstat6  Socket statistics (IPv6)                               - snmp6      Snmp data (IPv6)                                       -.............................................................................. - - -Table 1-9: Network info in /proc/net -.............................................................................. - File          Content                                                          - arp           Kernel  ARP table                                                - dev           network devices with statistics                                  +.. table:: Table 1-8: IPv6 info in /proc/net + + ========== ===================================================== + File       Content + ========== ===================================================== + udp6       UDP sockets (IPv6) + tcp6       TCP sockets (IPv6) + raw6       Raw device statistics (IPv6) + igmp6      IP multicast addresses, which this host joined (IPv6) + if_inet6   List of IPv6 interface addresses + ipv6_route Kernel routing table for IPv6 + rt6_stats  Global IPv6 routing tables statistics + sockstat6  Socket statistics (IPv6) + snmp6      Snmp data (IPv6) + ========== ===================================================== + +.. table:: Table 1-9: Network info in /proc/net + + ============= ================================================================ + File          Content + ============= ================================================================ + arp           Kernel  ARP table + dev           network devices with statistics   dev_mcast     the Layer2 multicast groups a device is listening too                 (interface index, label, number of references, number of bound -               addresses).  - dev_stat      network device status                                            - ip_fwchains   Firewall chain linkage                                           - ip_fwnames    Firewall chain names                                             - ip_masq       Directory containing the masquerading tables                     - ip_masquerade Major masquerading table                                         - netstat       Network statistics                                               - raw           raw device statistics                                            - route         Kernel routing table                                             - rpc           Directory containing rpc info                                    - rt_cache      Routing cache                                                    - snmp          SNMP data                                                        - sockstat      Socket statistics                                                - tcp           TCP  sockets                                                     - udp           UDP sockets                                                      - unix          UNIX domain sockets                                              - wireless      Wireless interface data (Wavelan etc)                            - igmp          IP multicast addresses, which this host joined                   - psched        Global packet scheduler parameters.                              - netlink       List of PF_NETLINK sockets                                       - ip_mr_vifs    List of multicast virtual interfaces                             - ip_mr_cache   List of multicast routing cache                                  -.............................................................................. +               addresses). + dev_stat      network device status + ip_fwchains   Firewall chain linkage + ip_fwnames    Firewall chain names + ip_masq       Directory containing the masquerading tables + ip_masquerade Major masquerading table + netstat       Network statistics + raw           raw device statistics + route         Kernel routing table + rpc           Directory containing rpc info + rt_cache      Routing cache + snmp          SNMP data + sockstat      Socket statistics + tcp           TCP  sockets + udp           UDP sockets + unix          UNIX domain sockets + wireless      Wireless interface data (Wavelan etc) + igmp          IP multicast addresses, which this host joined + psched        Global packet scheduler parameters. + netlink       List of PF_NETLINK sockets + ip_mr_vifs    List of multicast virtual interfaces + ip_mr_cache   List of multicast routing cache + ============= ================================================================  You can  use  this  information  to see which network devices are available in -your system and how much traffic was routed over those devices: - -  > cat /proc/net/dev  -  Inter-|Receive                                                   |[...  -   face |bytes    packets errs drop fifo frame compressed multicast|[...  -      lo:  908188   5596     0    0    0     0          0         0 [...          -    ppp0:15475140  20721   410    0    0   410          0         0 [...   -    eth0:  614530   7085     0    0    0     0          0         1 [...  -    -  ...] Transmit  -  ...] bytes    packets errs drop fifo colls carrier compressed  -  ...]  908188     5596    0    0    0     0       0          0  -  ...] 1375103    17405    0    0    0     0       0          0  -  ...] 1703981     5535    0    0    0     3       0          0  +your system and how much traffic was routed over those devices:: + +  > cat /proc/net/dev +  Inter-|Receive                                                   |[... +   face |bytes    packets errs drop fifo frame compressed multicast|[... +      lo:  908188   5596     0    0    0     0          0         0 [... +    ppp0:15475140  20721   410    0    0   410          0         0 [... +    eth0:  614530   7085     0    0    0     0          0         1 [... + +  ...] Transmit +  ...] bytes    packets errs drop fifo colls carrier compressed +  ...]  908188     5596    0    0    0     0       0          0 +  ...] 1375103    17405    0    0    0     0       0          0 +  ...] 1703981     5535    0    0    0     3       0          0  In addition, each Channel Bond interface has its own directory.  For  example, the bond0 device will have a directory called /proc/net/bond0/. @@ -1228,62 +1319,62 @@ many times the slaves link has failed.  If you  have  a  SCSI  host adapter in your system, you'll find a subdirectory  named after  the driver for this adapter in /proc/scsi. You'll also see a list -of all recognized SCSI devices in /proc/scsi: +of all recognized SCSI devices in /proc/scsi:: -  >cat /proc/scsi/scsi  -  Attached devices:  -  Host: scsi0 Channel: 00 Id: 00 Lun: 00  -    Vendor: IBM      Model: DGHS09U          Rev: 03E0  -    Type:   Direct-Access                    ANSI SCSI revision: 03  -  Host: scsi0 Channel: 00 Id: 06 Lun: 00  -    Vendor: PIONEER  Model: CD-ROM DR-U06S   Rev: 1.04  -    Type:   CD-ROM                           ANSI SCSI revision: 02  +  >cat /proc/scsi/scsi +  Attached devices: +  Host: scsi0 Channel: 00 Id: 00 Lun: 00 +    Vendor: IBM      Model: DGHS09U          Rev: 03E0 +    Type:   Direct-Access                    ANSI SCSI revision: 03 +  Host: scsi0 Channel: 00 Id: 06 Lun: 00 +    Vendor: PIONEER  Model: CD-ROM DR-U06S   Rev: 1.04 +    Type:   CD-ROM                           ANSI SCSI revision: 02  The directory  named  after  the driver has one file for each adapter found in  the system.  These  files  contain information about the controller, including  the used  IRQ  and  the  IO  address range. The amount of information shown is  dependent on  the adapter you use. The example shows the output for an Adaptec -AHA-2940 SCSI adapter: - -  > cat /proc/scsi/aic7xxx/0  -    -  Adaptec AIC7xxx driver version: 5.1.19/3.2.4  -  Compile Options:  -    TCQ Enabled By Default : Disabled  -    AIC7XXX_PROC_STATS     : Disabled  -    AIC7XXX_RESET_DELAY    : 5  -  Adapter Configuration:  -             SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter  -                             Ultra Wide Controller  -      PCI MMAPed I/O Base: 0xeb001000  -   Adapter SEEPROM Config: SEEPROM found and used.  -        Adaptec SCSI BIOS: Enabled  -                      IRQ: 10  -                     SCBs: Active 0, Max Active 2,  -                           Allocated 15, HW 16, Page 255  -               Interrupts: 160328  -        BIOS Control Word: 0x18b6  -     Adapter Control Word: 0x005b  -     Extended Translation: Enabled  -  Disconnect Enable Flags: 0xffff  -       Ultra Enable Flags: 0x0001  -   Tag Queue Enable Flags: 0x0000  -  Ordered Queue Tag Flags: 0x0000  -  Default Tag Queue Depth: 8  -      Tagged Queue By Device array for aic7xxx host instance 0:  -        {255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255}  -      Actual queue depth per device for aic7xxx host instance 0:  -        {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}  -  Statistics:  -  (scsi0:0:0:0)  -    Device using Wide/Sync transfers at 40.0 MByte/sec, offset 8  -    Transinfo settings: current(12/8/1/0), goal(12/8/1/0), user(12/15/1/0)  -    Total transfers 160151 (74577 reads and 85574 writes)  -  (scsi0:0:6:0)  -    Device using Narrow/Sync transfers at 5.0 MByte/sec, offset 15  -    Transinfo settings: current(50/15/0/0), goal(50/15/0/0), user(50/15/0/0)  -    Total transfers 0 (0 reads and 0 writes)  +AHA-2940 SCSI adapter:: + +  > cat /proc/scsi/aic7xxx/0 + +  Adaptec AIC7xxx driver version: 5.1.19/3.2.4 +  Compile Options: +    TCQ Enabled By Default : Disabled +    AIC7XXX_PROC_STATS     : Disabled +    AIC7XXX_RESET_DELAY    : 5 +  Adapter Configuration: +             SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter +                             Ultra Wide Controller +      PCI MMAPed I/O Base: 0xeb001000 +   Adapter SEEPROM Config: SEEPROM found and used. +        Adaptec SCSI BIOS: Enabled +                      IRQ: 10 +                     SCBs: Active 0, Max Active 2, +                           Allocated 15, HW 16, Page 255 +               Interrupts: 160328 +        BIOS Control Word: 0x18b6 +     Adapter Control Word: 0x005b +     Extended Translation: Enabled +  Disconnect Enable Flags: 0xffff +       Ultra Enable Flags: 0x0001 +   Tag Queue Enable Flags: 0x0000 +  Ordered Queue Tag Flags: 0x0000 +  Default Tag Queue Depth: 8 +      Tagged Queue By Device array for aic7xxx host instance 0: +        {255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255} +      Actual queue depth per device for aic7xxx host instance 0: +        {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1} +  Statistics: +  (scsi0:0:0:0) +    Device using Wide/Sync transfers at 40.0 MByte/sec, offset 8 +    Transinfo settings: current(12/8/1/0), goal(12/8/1/0), user(12/15/1/0) +    Total transfers 160151 (74577 reads and 85574 writes) +  (scsi0:0:6:0) +    Device using Narrow/Sync transfers at 5.0 MByte/sec, offset 15 +    Transinfo settings: current(50/15/0/0), goal(50/15/0/0), user(50/15/0/0) +    Total transfers 0 (0 reads and 0 writes)  1.6 Parallel port info in /proc/parport @@ -1296,18 +1387,20 @@ number (0,1,2,...).  These directories contain the four files shown in Table 1-10. -Table 1-10: Files in /proc/parport -.............................................................................. - File      Content                                                              - autoprobe Any IEEE-1284 device ID information that has been acquired.          +.. table:: Table 1-10: Files in /proc/parport + + ========= ==================================================================== + File      Content + ========= ==================================================================== + autoprobe Any IEEE-1284 device ID information that has been acquired.   devices   list of the device drivers using that port. A + will appear by the             name of the device currently using the port (it might not appear -           against any).  - hardware  Parallel port's base address, IRQ line and DMA channel.              +           against any). + hardware  Parallel port's base address, IRQ line and DMA channel.   irq       IRQ that parport is using for that port. This is in a separate             file to allow you to alter it by writing a new value in (IRQ -           number or none).  -.............................................................................. +           number or none). + ========= ====================================================================  1.7 TTY info in /proc/tty  ------------------------- @@ -1317,29 +1410,31 @@ directory /proc/tty.You'll  find  entries  for drivers and line disciplines in  this directory, as shown in Table 1-11. -Table 1-11: Files in /proc/tty -.............................................................................. - File          Content                                         - drivers       list of drivers and their usage                 - ldiscs        registered line disciplines                     - driver/serial usage statistic and status of single tty lines  -.............................................................................. +.. table:: Table 1-11: Files in /proc/tty + + ============= ============================================== + File          Content + ============= ============================================== + drivers       list of drivers and their usage + ldiscs        registered line disciplines + driver/serial usage statistic and status of single tty lines + ============= ==============================================  To see  which  tty's  are  currently in use, you can simply look into the file -/proc/tty/drivers: - -  > cat /proc/tty/drivers  -  pty_slave            /dev/pts      136   0-255 pty:slave  -  pty_master           /dev/ptm      128   0-255 pty:master  -  pty_slave            /dev/ttyp       3   0-255 pty:slave  -  pty_master           /dev/pty        2   0-255 pty:master  -  serial               /dev/cua        5   64-67 serial:callout  -  serial               /dev/ttyS       4   64-67 serial  -  /dev/tty0            /dev/tty0       4       0 system:vtmaster  -  /dev/ptmx            /dev/ptmx       5       2 system  -  /dev/console         /dev/console    5       1 system:console  -  /dev/tty             /dev/tty        5       0 system:/dev/tty  -  unknown              /dev/tty        4    1-63 console  +/proc/tty/drivers:: + +  > cat /proc/tty/drivers +  pty_slave            /dev/pts      136   0-255 pty:slave +  pty_master           /dev/ptm      128   0-255 pty:master +  pty_slave            /dev/ttyp       3   0-255 pty:slave +  pty_master           /dev/pty        2   0-255 pty:master +  serial               /dev/cua        5   64-67 serial:callout +  serial               /dev/ttyS       4   64-67 serial +  /dev/tty0            /dev/tty0       4       0 system:vtmaster +  /dev/ptmx            /dev/ptmx       5       2 system +  /dev/console         /dev/console    5       1 system:console +  /dev/tty             /dev/tty        5       0 system:/dev/tty +  unknown              /dev/tty        4    1-63 console  1.8 Miscellaneous kernel statistics in /proc/stat @@ -1347,7 +1442,7 @@ To see  which  tty's  are  currently in use, you can simply look into the file  Various pieces   of  information about  kernel activity  are  available in the  /proc/stat file.  All  of  the numbers reported  in  this file are  aggregates -since the system first booted.  For a quick look, simply cat the file: +since the system first booted.  For a quick look, simply cat the file::    > cat /proc/stat    cpu  2255 34 2290 22625563 6290 127 456 0 0 0 @@ -1372,6 +1467,7 @@ second).  The meanings of the columns are as follows, from left to right:  - idle: twiddling thumbs  - iowait: In a word, iowait stands for waiting for I/O to complete. But there    are several problems: +    1. Cpu will not wait for I/O to complete, iowait is the time that a task is       waiting for I/O to complete. When cpu goes into idle state for       outstanding task io, another task will be scheduled on this CPU. @@ -1379,6 +1475,7 @@ second).  The meanings of the columns are as follows, from left to right:       on any CPU, so the iowait of each CPU is difficult to calculate.    3. The value of iowait field in /proc/stat will decrease in certain       conditions. +    So, the iowait is not reliable by reading from /proc/stat.  - irq: servicing interrupts  - softirq: servicing softirqs @@ -1422,18 +1519,19 @@ Information about mounted ext4 file systems can be found in  /proc/fs/ext4/dm-0).   The files in each per-device directory are shown  in Table 1-12, below. -Table 1-12: Files in /proc/fs/ext4/<devname> -.............................................................................. - File            Content                                         +.. table:: Table 1-12: Files in /proc/fs/ext4/<devname> + + ==============  ========================================================== + File            Content   mb_groups       details of multiblock allocator buddy cache of free blocks -.............................................................................. + ==============  ==========================================================  2.0 /proc/consoles  ------------------  Shows registered system console lines.  To see which character device lines are currently used for the system console -/dev/console, you may simply look into the file /proc/consoles: +/dev/console, you may simply look into the file /proc/consoles::    > cat /proc/consoles    tty0                 -WU (ECp)       4:7 @@ -1441,41 +1539,45 @@ To see which character device lines are currently used for the system console  The columns are: -  device               name of the device -  operations           R = can do read operations -                       W = can do write operations -                       U = can do unblank -  flags                E = it is enabled -                       C = it is preferred console -                       B = it is primary boot console -                       p = it is used for printk buffer -                       b = it is not a TTY but a Braille device -                       a = it is safe to use when cpu is offline -  major:minor          major and minor number of the device separated by a colon ++--------------------+-------------------------------------------------------+ +| device             | name of the device                                    | ++====================+=======================================================+ +| operations         | * R = can do read operations                          | +|                    | * W = can do write operations                         | +|                    | * U = can do unblank                                  | ++--------------------+-------------------------------------------------------+ +| flags              | * E = it is enabled                                   | +|                    | * C = it is preferred console                         | +|                    | * B = it is primary boot console                      | +|                    | * p = it is used for printk buffer                    | +|                    | * b = it is not a TTY but a Braille device            | +|                    | * a = it is safe to use when cpu is offline           | ++--------------------+-------------------------------------------------------+ +| major:minor        | major and minor number of the device separated by a   | +|                    | colon                                                 | ++--------------------+-------------------------------------------------------+ -------------------------------------------------------------------------------  Summary ------------------------------------------------------------------------------- +------- +  The /proc file system serves information about the running system. It not only  allows access to process data but also allows you to request the kernel status  by reading files in the hierarchy.  The directory  structure  of /proc reflects the types of information and makes  it easy, if not obvious, where to look for specific data. ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- -CHAPTER 2: MODIFYING SYSTEM PARAMETERS ------------------------------------------------------------------------------- +Chapter 2: Modifying System Parameters +====================================== -------------------------------------------------------------------------------  In This Chapter ------------------------------------------------------------------------------- +--------------- +  * Modifying kernel parameters by writing into files found in /proc/sys  * Exploring the files which modify certain parameters  * Review of the /proc/sys file tree ------------------------------------------------------------------------------- +------------------------------------------------------------------------------  A very  interesting part of /proc is the directory /proc/sys. This is not only  a source  of  information,  it also allows you to change parameters within the @@ -1503,19 +1605,18 @@ kernels, and became part of it in version 2.2.1 of the Linux kernel.  Please see: Documentation/admin-guide/sysctl/ directory for descriptions of these  entries. -------------------------------------------------------------------------------  Summary ------------------------------------------------------------------------------- +------- +  Certain aspects  of  kernel  behavior  can be modified at runtime, without the  need to  recompile  the kernel, or even to reboot the system. The files in the  /proc/sys tree  can  not only be read, but also modified. You can use the echo  command to write value into these files, thereby changing the default settings  of the kernel. ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- -CHAPTER 3: PER-PROCESS PARAMETERS ------------------------------------------------------------------------------- + +Chapter 3: Per-process Parameters +=================================  3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj- Adjust the oom-killer score  -------------------------------------------------------------------------------- @@ -1588,26 +1689,28 @@ process should be killed in an out-of-memory situation.  This file contains IO statistics for each running process  Example -------- +~~~~~~~ + +:: -test:/tmp # dd if=/dev/zero of=/tmp/test.dat & -[1] 3828 +    test:/tmp # dd if=/dev/zero of=/tmp/test.dat & +    [1] 3828 -test:/tmp # cat /proc/3828/io -rchar: 323934931 -wchar: 323929600 -syscr: 632687 -syscw: 632675 -read_bytes: 0 -write_bytes: 323932160 -cancelled_write_bytes: 0 +    test:/tmp # cat /proc/3828/io +    rchar: 323934931 +    wchar: 323929600 +    syscr: 632687 +    syscw: 632675 +    read_bytes: 0 +    write_bytes: 323932160 +    cancelled_write_bytes: 0  Description ------------ +~~~~~~~~~~~  rchar ------ +^^^^^  I/O counter: chars read  The number of bytes which this task has caused to be read from storage. This @@ -1618,7 +1721,7 @@ pagecache)  wchar ------ +^^^^^  I/O counter: chars written  The number of bytes which this task has caused, or shall cause to be written @@ -1626,7 +1729,7 @@ to disk. Similar caveats apply here as with rchar.  syscr ------ +^^^^^  I/O counter: read syscalls  Attempt to count the number of read I/O operations, i.e. syscalls like read() @@ -1634,7 +1737,7 @@ and pread().  syscw ------ +^^^^^  I/O counter: write syscalls  Attempt to count the number of write I/O operations, i.e. syscalls like @@ -1642,7 +1745,7 @@ write() and pwrite().  read_bytes ----------- +^^^^^^^^^^  I/O counter: bytes read  Attempt to count the number of bytes which this process really did cause to @@ -1652,7 +1755,7 @@ CIFS at a later time>  write_bytes ------------ +^^^^^^^^^^^  I/O counter: bytes written  Attempt to count the number of bytes which this process caused to be sent to @@ -1660,7 +1763,7 @@ the storage layer. This is done at page-dirtying time.  cancelled_write_bytes ---------------------- +^^^^^^^^^^^^^^^^^^^^^  The big inaccuracy here is truncate. If a process writes 1MB to a file and  then deletes the file, it will in fact perform no writeout. But it will have @@ -1673,12 +1776,11 @@ from the truncating task's write_bytes, but there is information loss in doing  that. -Note ----- +.. Note:: -At its current implementation state, this is a bit racy on 32-bit machines: if -process A reads process B's /proc/pid/io while process B is updating one of -those 64-bit counters, process A could see an intermediate result. +   At its current implementation state, this is a bit racy on 32-bit machines: +   if process A reads process B's /proc/pid/io while process B is updating one +   of those 64-bit counters, process A could see an intermediate result.  More information about this can be found within the taskstats documentation in @@ -1698,12 +1800,13 @@ of memory types. If a bit of the bitmask is set, memory segments of the  corresponding memory type are dumped, otherwise they are not dumped.  The following 9 memory types are supported: +    - (bit 0) anonymous private memory    - (bit 1) anonymous shared memory    - (bit 2) file-backed private memory    - (bit 3) file-backed shared memory    - (bit 4) ELF header pages in file-backed private memory areas (it is -            effective only if the bit 2 is cleared) +    effective only if the bit 2 is cleared)    - (bit 5) hugetlb private memory    - (bit 6) hugetlb shared memory    - (bit 7) DAX private memory @@ -1719,13 +1822,13 @@ The default value of coredump_filter is 0x33; this means all anonymous memory  segments, ELF header pages and hugetlb private memory are dumped.  If you don't want to dump all shared memory segments attached to pid 1234, -write 0x31 to the process's proc file. +write 0x31 to the process's proc file::    $ echo 0x31 > /proc/1234/coredump_filter  When a new process is created, the process inherits the bitmask status from its  parent. It is useful to set up coredump_filter before the program runs. -For example: +For example::    $ echo 0x7 > /proc/self/coredump_filter    $ ./some_program @@ -1733,35 +1836,37 @@ For example:  3.5	/proc/<pid>/mountinfo - Information about mounts  -------------------------------------------------------- -This file contains lines of the form: +This file contains lines of the form:: -36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue -(1)(2)(3)   (4)   (5)      (6)      (7)   (8) (9)   (10)         (11) +    36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue +    (1)(2)(3)   (4)   (5)      (6)      (7)   (8) (9)   (10)         (11) -(1) mount ID:  unique identifier of the mount (may be reused after umount) -(2) parent ID:  ID of parent (or of self for the top of the mount tree) -(3) major:minor:  value of st_dev for files on filesystem -(4) root:  root of the mount within the filesystem -(5) mount point:  mount point relative to the process's root -(6) mount options:  per mount options -(7) optional fields:  zero or more fields of the form "tag[:value]" -(8) separator:  marks the end of the optional fields -(9) filesystem type:  name of filesystem of the form "type[.subtype]" -(10) mount source:  filesystem specific information or "none" -(11) super options:  per super block options +    (1) mount ID:  unique identifier of the mount (may be reused after umount) +    (2) parent ID:  ID of parent (or of self for the top of the mount tree) +    (3) major:minor:  value of st_dev for files on filesystem +    (4) root:  root of the mount within the filesystem +    (5) mount point:  mount point relative to the process's root +    (6) mount options:  per mount options +    (7) optional fields:  zero or more fields of the form "tag[:value]" +    (8) separator:  marks the end of the optional fields +    (9) filesystem type:  name of filesystem of the form "type[.subtype]" +    (10) mount source:  filesystem specific information or "none" +    (11) super options:  per super block options  Parsers should ignore all unrecognised optional fields.  Currently the  possible optional fields are: -shared:X  mount is shared in peer group X -master:X  mount is slave to peer group X -propagate_from:X  mount is slave and receives propagation from peer group X (*) -unbindable  mount is unbindable +================  ============================================================== +shared:X          mount is shared in peer group X +master:X          mount is slave to peer group X +propagate_from:X  mount is slave and receives propagation from peer group X [#]_ +unbindable        mount is unbindable +================  ============================================================== -(*) X is the closest dominant peer group under the process's root.  If -X is the immediate master of the mount, or if there's no dominant peer -group under the same root, then only the "master:X" field is present -and not the "propagate_from:X" field. +.. [#] X is the closest dominant peer group under the process's root.  If +       X is the immediate master of the mount, or if there's no dominant peer +       group under the same root, then only the "master:X" field is present +       and not the "propagate_from:X" field.  For more information on mount propagation see: @@ -1804,77 +1909,86 @@ created with [see open(2) for details] and 'mnt_id' represents mount ID of  the file system containing the opened file [see 3.5 /proc/<pid>/mountinfo  for details]. -A typical output is +A typical output is::  	pos:	0  	flags:	0100002  	mnt_id:	19 -All locks associated with a file descriptor are shown in its fdinfo too. +All locks associated with a file descriptor are shown in its fdinfo too:: -lock:       1: FLOCK  ADVISORY  WRITE 359 00:13:11691 0 EOF +    lock:       1: FLOCK  ADVISORY  WRITE 359 00:13:11691 0 EOF  The files such as eventfd, fsnotify, signalfd, epoll among the regular pos/flags  pair provide additional information particular to the objects they represent. -	Eventfd files -	~~~~~~~~~~~~~ +Eventfd files +~~~~~~~~~~~~~ + +:: +  	pos:	0  	flags:	04002  	mnt_id:	9  	eventfd-count:	5a -	where 'eventfd-count' is hex value of a counter. +where 'eventfd-count' is hex value of a counter. + +Signalfd files +~~~~~~~~~~~~~~ + +:: -	Signalfd files -	~~~~~~~~~~~~~~  	pos:	0  	flags:	04002  	mnt_id:	9  	sigmask:	0000000000000200 -	where 'sigmask' is hex value of the signal mask associated -	with a file. +where 'sigmask' is hex value of the signal mask associated +with a file. + +Epoll files +~~~~~~~~~~~ + +:: -	Epoll files -	~~~~~~~~~~~  	pos:	0  	flags:	02  	mnt_id:	9  	tfd:        5 events:       1d data: ffffffffffffffff pos:0 ino:61af sdev:7 -	where 'tfd' is a target file descriptor number in decimal form, -	'events' is events mask being watched and the 'data' is data -	associated with a target [see epoll(7) for more details]. +where 'tfd' is a target file descriptor number in decimal form, +'events' is events mask being watched and the 'data' is data +associated with a target [see epoll(7) for more details]. -	The 'pos' is current offset of the target file in decimal form -	[see lseek(2)], 'ino' and 'sdev' are inode and device numbers -	where target file resides, all in hex format. +The 'pos' is current offset of the target file in decimal form +[see lseek(2)], 'ino' and 'sdev' are inode and device numbers +where target file resides, all in hex format. -	Fsnotify files -	~~~~~~~~~~~~~~ -	For inotify files the format is the following +Fsnotify files +~~~~~~~~~~~~~~ +For inotify files the format is the following::  	pos:	0  	flags:	02000000  	inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d -	where 'wd' is a watch descriptor in decimal form, ie a target file -	descriptor number, 'ino' and 'sdev' are inode and device where the -	target file resides and the 'mask' is the mask of events, all in hex -	form [see inotify(7) for more details]. +where 'wd' is a watch descriptor in decimal form, ie a target file +descriptor number, 'ino' and 'sdev' are inode and device where the +target file resides and the 'mask' is the mask of events, all in hex +form [see inotify(7) for more details]. -	If the kernel was built with exportfs support, the path to the target -	file is encoded as a file handle.  The file handle is provided by three -	fields 'fhandle-bytes', 'fhandle-type' and 'f_handle', all in hex -	format. +If the kernel was built with exportfs support, the path to the target +file is encoded as a file handle.  The file handle is provided by three +fields 'fhandle-bytes', 'fhandle-type' and 'f_handle', all in hex +format. -	If the kernel is built without exportfs support the file handle won't be -	printed out. +If the kernel is built without exportfs support the file handle won't be +printed out. -	If there is no inotify mark attached yet the 'inotify' line will be omitted. +If there is no inotify mark attached yet the 'inotify' line will be omitted. -	For fanotify files the format is +For fanotify files the format is::  	pos:	0  	flags:	02 @@ -1883,20 +1997,22 @@ pair provide additional information particular to the objects they represent.  	fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003  	fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4 -	where fanotify 'flags' and 'event-flags' are values used in fanotify_init -	call, 'mnt_id' is the mount point identifier, 'mflags' is the value of -	flags associated with mark which are tracked separately from events -	mask. 'ino', 'sdev' are target inode and device, 'mask' is the events -	mask and 'ignored_mask' is the mask of events which are to be ignored. -	All in hex format. Incorporation of 'mflags', 'mask' and 'ignored_mask' -	does provide information about flags and mask used in fanotify_mark -	call [see fsnotify manpage for details]. +where fanotify 'flags' and 'event-flags' are values used in fanotify_init +call, 'mnt_id' is the mount point identifier, 'mflags' is the value of +flags associated with mark which are tracked separately from events +mask. 'ino', 'sdev' are target inode and device, 'mask' is the events +mask and 'ignored_mask' is the mask of events which are to be ignored. +All in hex format. Incorporation of 'mflags', 'mask' and 'ignored_mask' +does provide information about flags and mask used in fanotify_mark +call [see fsnotify manpage for details]. + +While the first three lines are mandatory and always printed, the rest is +optional and may be omitted if no marks created yet. -	While the first three lines are mandatory and always printed, the rest is -	optional and may be omitted if no marks created yet. +Timerfd files +~~~~~~~~~~~~~ -	Timerfd files -	~~~~~~~~~~~~~ +::  	pos:	0  	flags:	02 @@ -1907,18 +2023,18 @@ pair provide additional information particular to the objects they represent.  	it_value: (0, 49406829)  	it_interval: (1, 0) -	where 'clockid' is the clock type and 'ticks' is the number of the timer expirations -	that have occurred [see timerfd_create(2) for details]. 'settime flags' are -	flags in octal form been used to setup the timer [see timerfd_settime(2) for -	details]. 'it_value' is remaining time until the timer exiration. -	'it_interval' is the interval for the timer. Note the timer might be set up -	with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value' -	still exhibits timer's remaining time. +where 'clockid' is the clock type and 'ticks' is the number of the timer expirations +that have occurred [see timerfd_create(2) for details]. 'settime flags' are +flags in octal form been used to setup the timer [see timerfd_settime(2) for +details]. 'it_value' is remaining time until the timer exiration. +'it_interval' is the interval for the timer. Note the timer might be set up +with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value' +still exhibits timer's remaining time.  3.9	/proc/<pid>/map_files - Information about memory mapped files  ---------------------------------------------------------------------  This directory contains symbolic links which represent memory mapped files -the process is maintaining.  Example output: +the process is maintaining.  Example output::       | lr-------- 1 root root 64 Jan 27 11:24 333c600000-333c620000 -> /usr/lib64/ld-2.18.so       | lr-------- 1 root root 64 Jan 27 11:24 333c81f000-333c820000 -> /usr/lib64/ld-2.18.so @@ -1976,17 +2092,22 @@ When CONFIG_PROC_PID_ARCH_STATUS is enabled, this file displays the  architecture specific status of the task.  Example -------- +~~~~~~~ + +:: +   $ cat /proc/6753/arch_status   AVX512_elapsed_ms:      8  Description ------------ +~~~~~~~~~~~  x86 specific entries: ---------------------- - AVX512_elapsed_ms: - ------------------ +~~~~~~~~~~~~~~~~~~~~~ + +AVX512_elapsed_ms: +^^^^^^^^^^^^^^^^^^ +    If AVX512 is supported on the machine, this entry shows the milliseconds    elapsed since the last time AVX512 usage was recorded. The recording    happens on a best effort basis when a task is scheduled out. This means @@ -2010,17 +2131,18 @@ x86 specific entries:    the task is unlikely an AVX512 user, but depends on the workload and the    scheduling scenario, it also could be a false negative mentioned above. -------------------------------------------------------------------------------  Configuring procfs ------------------------------------------------------------------------------- +------------------  4.1	Mount options  ---------------------  The following mount options are supported: +	=========	========================================================  	hidepid=	Set /proc/<pid>/ access mode.  	gid=		Set the group authorized to learn processes information. +	=========	========================================================  hidepid=0 means classic mode - everybody may access all /proc/<pid>/ directories  (default). diff --git a/Documentation/filesystems/qnx6.txt b/Documentation/filesystems/qnx6.rst index 48ea68f15845..fd13433d362c 100644 --- a/Documentation/filesystems/qnx6.txt +++ b/Documentation/filesystems/qnx6.rst @@ -1,3 +1,6 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================  The QNX6 Filesystem  =================== @@ -14,10 +17,12 @@ Specification  qnx6fs shares many properties with traditional Unix filesystems. It has the  concepts of blocks, inodes and directories. +  On QNX it is possible to create little endian and big endian qnx6 filesystems.  This feature makes it possible to create and use a different endianness fs  for the target (QNX is used on quite a range of embedded systems) platform  running on a different endianness. +  The Linux driver handles endianness transparently. (LE and BE)  Blocks @@ -26,6 +31,7 @@ Blocks  The space in the device or file is split up into blocks. These are a fixed  size of 512, 1024, 2048 or 4096, which is decided when the filesystem is  created. +  Blockpointers are 32bit, so the maximum space that can be addressed is  2^32 * 4096 bytes or 16TB @@ -50,6 +56,7 @@ Each of these root nodes holds information like total size of the stored  data and the addressing levels in that specific tree.  If the level value is 0, up to 16 direct blocks can be addressed by each  node. +  Level 1 adds an additional indirect addressing level where each indirect  addressing block holds up to blocksize / 4 bytes pointers to data blocks.  Level 2 adds an additional indirect addressing block level (so, already up @@ -57,11 +64,13 @@ to 16 * 256 * 256 = 1048576 blocks that can be addressed by such a tree).  Unused block pointers are always set to ~0 - regardless of root node,  indirect addressing blocks or inodes. +  Data leaves are always on the lowest level. So no data is stored on upper  tree levels.  The first Superblock is located at 0x2000. (0x2000 is the bootblock size)  The Audi MMI 3G first superblock directly starts at byte 0. +  Second superblock position can either be calculated from the superblock  information (total number of filesystem blocks) or by taking the highest  device address, zeroing the last 3 bytes and then subtracting 0x1000 from @@ -84,6 +93,7 @@ Object mode field is POSIX format. (which makes things easier)  There are also pointers to the first 16 blocks, if the object data can be  addressed with 16 direct blocks. +  For more than 16 blocks an indirect addressing in form of another tree is  used. (scheme is the same as the one used for the superblock root nodes) @@ -96,13 +106,18 @@ Directories  A directory is a filesystem object and has an inode just like a file.  It is a specially formatted file containing records which associate each  name with an inode number. +  '.' inode number points to the directory inode +  '..' inode number points to the parent directory inode +  Eeach filename record additionally got a filename length field.  One special case are long filenames or subdirectory names. +  These got set a filename length field of 0xff in the corresponding directory  record plus the longfile inode number also stored in that record. +  With that longfilename inode number, the longfilename tree can be walked  starting with the superblock longfilename root node pointers. @@ -111,6 +126,7 @@ Special files  Symbolic links are also filesystem objects with inodes. They got a specific  bit in the inode mode field identifying them as symbolic link. +  The directory entry file inode pointer points to the target file inode.  Hard links got an inode, a directory entry, but a specific mode bit set, @@ -126,9 +142,11 @@ Long filenames  Long filenames are stored in a separate addressing tree. The staring point  is the longfilename root node in the active superblock. +  Each data block (tree leaves) holds one long filename. That filename is  limited to 510 bytes. The first two starting bytes are used as length field  for the actual filename. +  If that structure shall fit for all allowed blocksizes, it is clear why there  is a limit of 510 bytes for the actual filename stored. @@ -138,6 +156,7 @@ Bitmap  The qnx6fs filesystem allocation bitmap is stored in a tree under bitmap  root node in the superblock and each bit in the bitmap represents one  filesystem block. +  The first block is block 0, which starts 0x1000 after superblock start.  So for a normal qnx6fs 0x3000 (bootblock + superblock) is the physical  address at which block 0 is located. @@ -149,11 +168,14 @@ Bitmap system area  ------------------  The bitmap itself is divided into three parts. +  First the system area, that is split into two halves. +  Then userspace.  The requirement for a static, fixed preallocated system area comes from how  qnx6fs deals with writes. +  Each superblock got it's own half of the system area. So superblock #1  always uses blocks from the lower half while superblock #2 just writes to  blocks represented by the upper half bitmap system area bits. @@ -163,7 +185,7 @@ tree structures are treated as system blocks.  The rational behind that is that a write request can work on a new snapshot  (system area of the inactive - resp. lower serial numbered superblock) while -at the same time there is still a complete stable filesystem structer in the +at the same time there is still a complete stable filesystem structure in the  other half of the system area.  When finished with writing (a sync write is completed, the maximum sync leap diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.rst index 97d42ccaa92d..6c576e241d86 100644 --- a/Documentation/filesystems/ramfs-rootfs-initramfs.txt +++ b/Documentation/filesystems/ramfs-rootfs-initramfs.rst @@ -1,5 +1,11 @@ -ramfs, rootfs and initramfs +.. SPDX-License-Identifier: GPL-2.0 + +=========================== +Ramfs, rootfs and initramfs +=========================== +  October 17, 2005 +  Rob Landley <[email protected]>  ============================= @@ -99,14 +105,14 @@ out of that.  All this differs from the old initrd in several ways:    - The old initrd was always a separate file, while the initramfs archive is -    linked into the linux kernel image.  (The directory linux-*/usr is devoted -    to generating this archive during the build.) +    linked into the linux kernel image.  (The directory ``linux-*/usr`` is +    devoted to generating this archive during the build.)    - The old initrd file was a gzipped filesystem image (in some file format,      such as ext2, that needed a driver built into the kernel), while the new      initramfs archive is a gzipped cpio archive (like tar only simpler, -    see cpio(1) and Documentation/driver-api/early-userspace/buffer-format.rst).  The -    kernel's cpio extraction code is not only extremely small, it's also +    see cpio(1) and Documentation/driver-api/early-userspace/buffer-format.rst). +    The kernel's cpio extraction code is not only extremely small, it's also      __init text and data that can be discarded during the boot process.    - The program run by the old initrd (which was called /initrd, not /init) did @@ -139,7 +145,7 @@ and living in usr/Kconfig) can be used to specify a source for the  initramfs archive, which will automatically be incorporated into the  resulting binary.  This option can point to an existing gzipped cpio  archive, a directory containing files to be archived, or a text file -specification such as the following example: +specification such as the following example::    dir /dev 755 0 0    nod /dev/console 644 0 0 c 5 1 @@ -175,12 +181,12 @@ or extracting your own preprepared cpio files to feed to the kernel build  (instead of a config file or directory).  The following command line can extract a cpio image (either by the above script -or by the kernel build) back into its component files: +or by the kernel build) back into its component files::    cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames  The following shell script can create a prebuilt cpio archive you can -use in place of the above config file: +use in place of the above config file::    #!/bin/sh @@ -202,14 +208,17 @@ use in place of the above config file:      exit 1    fi -Note: The cpio man page contains some bad advice that will break your initramfs -archive if you follow it.  It says "A typical way to generate the list -of filenames is with the find command; you should give find the -depth option -to minimize problems with permissions on directories that are unwritable or not -searchable."  Don't do this when creating initramfs.cpio.gz images, it won't -work.  The Linux kernel cpio extractor won't create files in a directory that -doesn't exist, so the directory entries must go before the files that go in -those directories.  The above script gets them in the right order. +.. Note:: + +   The cpio man page contains some bad advice that will break your initramfs +   archive if you follow it.  It says "A typical way to generate the list +   of filenames is with the find command; you should give find the -depth +   option to minimize problems with permissions on directories that are +   unwritable or not searchable."  Don't do this when creating +   initramfs.cpio.gz images, it won't work.  The Linux kernel cpio extractor +   won't create files in a directory that doesn't exist, so the directory +   entries must go before the files that go in those directories. +   The above script gets them in the right order.  External initramfs images:  -------------------------- @@ -236,9 +245,10 @@ An initramfs archive is a complete self-contained root filesystem for Linux.  If you don't already understand what shared libraries, devices, and paths  you need to get a minimal root filesystem up and running, here are some  references: -http://www.tldp.org/HOWTO/Bootdisk-HOWTO/ -http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html -http://www.linuxfromscratch.org/lfs/view/stable/ + +- http://www.tldp.org/HOWTO/Bootdisk-HOWTO/ +- http://www.tldp.org/HOWTO/From-PowerUp-To-Bash-Prompt-HOWTO.html +- http://www.linuxfromscratch.org/lfs/view/stable/  The "klibc" package (http://www.kernel.org/pub/linux/libs/klibc) is  designed to be a tiny C library to statically link early userspace @@ -255,7 +265,7 @@ name lookups, even when otherwise statically linked.)  A good first step is to get initramfs to run a statically linked "hello world"  program as init, and test it under an emulator like qemu (www.qemu.org) or -User Mode Linux, like so: +User Mode Linux, like so::    cat > hello.c << EOF    #include <stdio.h> @@ -326,8 +336,8 @@ the above threads) is:     explained his reasoning: -      http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html -      http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html +     - http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html +     - http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html     and, most importantly, designed and implemented the initramfs code. diff --git a/Documentation/filesystems/relay.txt b/Documentation/filesystems/relay.rst index cd709a94d054..04ad083cfe62 100644 --- a/Documentation/filesystems/relay.txt +++ b/Documentation/filesystems/relay.rst @@ -1,3 +1,6 @@ +.. SPDX-License-Identifier: GPL-2.0 + +==================================  relay interface (formerly relayfs)  ================================== @@ -108,6 +111,7 @@ The relay interface implements basic file operations for user space  access to relay channel buffer data.  Here are the file operations  that are available and some comments regarding their behavior: +=========== ============================================================  open()	    enables user to open an _existing_ channel buffer.  mmap()      results in channel buffer being mapped into the caller's @@ -136,13 +140,16 @@ poll()      POLLIN/POLLRDNORM/POLLERR supported.  User applications are  close()     decrements the channel buffer's refcount.  When the refcount  	    reaches 0, i.e. when no process or kernel client has the  	    buffer open, the channel buffer is freed. +=========== ============================================================  In order for a user application to make use of relay files, the -host filesystem must be mounted.  For example, +host filesystem must be mounted.  For example::  	mount -t debugfs debugfs /sys/kernel/debug -NOTE:   the host filesystem doesn't need to be mounted for kernel +.. Note:: + +	the host filesystem doesn't need to be mounted for kernel  	clients to create or use channels - it only needs to be  	mounted when user space applications need access to the buffer  	data. @@ -154,7 +161,7 @@ The relay interface kernel API  Here's a summary of the API the relay interface provides to in-kernel clients:  TBD(curr. line MT:/API/) -  channel management functions: +  channel management functions::      relay_open(base_filename, parent, subbuf_size, n_subbufs,                 callbacks, private_data) @@ -162,17 +169,17 @@ TBD(curr. line MT:/API/)      relay_flush(chan)      relay_reset(chan) -  channel management typically called on instigation of userspace: +  channel management typically called on instigation of userspace::      relay_subbufs_consumed(chan, cpu, subbufs_consumed) -  write functions: +  write functions::      relay_write(chan, data, length)      __relay_write(chan, data, length)      relay_reserve(chan, length) -  callbacks: +  callbacks::      subbuf_start(buf, subbuf, prev_subbuf, prev_padding)      buf_mapped(buf, filp) @@ -180,7 +187,7 @@ TBD(curr. line MT:/API/)      create_buf_file(filename, parent, mode, buf, is_global)      remove_buf_file(dentry) -  helper functions: +  helper functions::      relay_buf_full(buf)      subbuf_start_reserve(buf, length) @@ -215,41 +222,41 @@ the file(s) created in create_buf_file() and is called during  relay_close().  Here are some typical definitions for these callbacks, in this case -using debugfs: - -/* - * create_buf_file() callback.  Creates relay file in debugfs. - */ -static struct dentry *create_buf_file_handler(const char *filename, -                                              struct dentry *parent, -                                              umode_t mode, -                                              struct rchan_buf *buf, -                                              int *is_global) -{ -        return debugfs_create_file(filename, mode, parent, buf, -	                           &relay_file_operations); -} - -/* - * remove_buf_file() callback.  Removes relay file from debugfs. - */ -static int remove_buf_file_handler(struct dentry *dentry) -{ -        debugfs_remove(dentry); - -        return 0; -} - -/* - * relay interface callbacks - */ -static struct rchan_callbacks relay_callbacks = -{ -        .create_buf_file = create_buf_file_handler, -        .remove_buf_file = remove_buf_file_handler, -}; - -And an example relay_open() invocation using them: +using debugfs:: + +    /* +    * create_buf_file() callback.  Creates relay file in debugfs. +    */ +    static struct dentry *create_buf_file_handler(const char *filename, +						struct dentry *parent, +						umode_t mode, +						struct rchan_buf *buf, +						int *is_global) +    { +	    return debugfs_create_file(filename, mode, parent, buf, +				    &relay_file_operations); +    } + +    /* +    * remove_buf_file() callback.  Removes relay file from debugfs. +    */ +    static int remove_buf_file_handler(struct dentry *dentry) +    { +	    debugfs_remove(dentry); + +	    return 0; +    } + +    /* +    * relay interface callbacks +    */ +    static struct rchan_callbacks relay_callbacks = +    { +	    .create_buf_file = create_buf_file_handler, +	    .remove_buf_file = remove_buf_file_handler, +    }; + +And an example relay_open() invocation using them::    chan = relay_open("cpu", NULL, SUBBUF_SIZE, N_SUBBUFS, &relay_callbacks, NULL); @@ -339,23 +346,23 @@ whether or not to actually move on to the next sub-buffer.  To implement 'no-overwrite' mode, the userspace client would provide  an implementation of the subbuf_start() callback something like the -following: +following:: -static int subbuf_start(struct rchan_buf *buf, -                        void *subbuf, -			void *prev_subbuf, -			unsigned int prev_padding) -{ -	if (prev_subbuf) -		*((unsigned *)prev_subbuf) = prev_padding; +    static int subbuf_start(struct rchan_buf *buf, +			    void *subbuf, +			    void *prev_subbuf, +			    unsigned int prev_padding) +    { +	    if (prev_subbuf) +		    *((unsigned *)prev_subbuf) = prev_padding; -	if (relay_buf_full(buf)) -		return 0; +	    if (relay_buf_full(buf)) +		    return 0; -	subbuf_start_reserve(buf, sizeof(unsigned int)); +	    subbuf_start_reserve(buf, sizeof(unsigned int)); -	return 1; -} +	    return 1; +    }  If the current buffer is full, i.e. all sub-buffers remain unconsumed,  the callback returns 0 to indicate that the buffer switch should not @@ -370,20 +377,20 @@ ready sub-buffers will relay_buf_full() return 0, in which case the  buffer switch can continue.  The implementation of the subbuf_start() callback for 'overwrite' mode -would be very similar: +would be very similar:: -static int subbuf_start(struct rchan_buf *buf, -                        void *subbuf, -			void *prev_subbuf, -			size_t prev_padding) -{ -	if (prev_subbuf) -		*((unsigned *)prev_subbuf) = prev_padding; +    static int subbuf_start(struct rchan_buf *buf, +			    void *subbuf, +			    void *prev_subbuf, +			    size_t prev_padding) +    { +	    if (prev_subbuf) +		    *((unsigned *)prev_subbuf) = prev_padding; -	subbuf_start_reserve(buf, sizeof(unsigned int)); +	    subbuf_start_reserve(buf, sizeof(unsigned int)); -	return 1; -} +	    return 1; +    }  In this case, the relay_buf_full() check is meaningless and the  callback always returns 1, causing the buffer switch to occur diff --git a/Documentation/filesystems/romfs.txt b/Documentation/filesystems/romfs.rst index e2b07cc9120a..465b11efa9be 100644 --- a/Documentation/filesystems/romfs.txt +++ b/Documentation/filesystems/romfs.rst @@ -1,4 +1,8 @@ -ROMFS - ROM FILE SYSTEM +.. SPDX-License-Identifier: GPL-2.0 + +======================= +ROMFS - ROM File System +=======================  This is a quite dumb, read only filesystem, mainly for initial RAM  disks of installation disks.  It has grown up by the need of having @@ -51,9 +55,9 @@ the 16 byte padding for the name and the contents, also 16+14+15 = 45  bytes.  This is quite rare however, since most file names are longer  than 3 bytes, and shorter than 15 bytes. -The layout of the filesystem is the following: +The layout of the filesystem is the following:: -offset	    content + offset	    content  	+---+---+---+---+    0	| - | r | o | m |  \ @@ -84,9 +88,9 @@ the source.  This algorithm was chosen because although it's not quite  reliable, it does not require any tables, and it is very simple.  The following bytes are now part of the file system; each file header -must begin on a 16 byte boundary. +must begin on a 16 byte boundary:: -offset	    content + offset	    content       	+---+---+---+---+    0	| next filehdr|X|	The offset of the next file header @@ -114,7 +118,9 @@ file is user and group 0, this should never be a problem for the  intended use.  The mapping of the 8 possible values to file types is  the following: +==	=============== ============================================  	  mapping		spec.info means +==	=============== ============================================   0	hard link	link destination [file header]   1	directory	first file's header   2	regular file	unused, must be zero [MBZ] @@ -123,6 +129,7 @@ the following:   5	char device		    - " -   6	socket		unused, MBZ   7	fifo		unused, MBZ +==	=============== ============================================  Note that hard links are specifically marked in this filesystem, but  they will behave as you can expect (i.e. share the inode number). @@ -158,24 +165,24 @@ to [email protected], the content is irrelevant.  Pending issues:  - Permissions and owner information are pretty essential features of a -Un*x like system, but romfs does not provide the full possibilities. -I have never found this limiting, but others might. +  Un*x like system, but romfs does not provide the full possibilities. +  I have never found this limiting, but others might.  - The file system is read only, so it can be very small, but in case -one would want to write _anything_ to a file system, he still needs -a writable file system, thus negating the size advantages.  Possible -solutions: implement write access as a compile-time option, or a new, -similarly small writable filesystem for RAM disks. +  one would want to write _anything_ to a file system, he still needs +  a writable file system, thus negating the size advantages.  Possible +  solutions: implement write access as a compile-time option, or a new, +  similarly small writable filesystem for RAM disks.  - Since the files are only required to have alignment on a 16 byte -boundary, it is currently possibly suboptimal to read or execute files -from the filesystem.  It might be resolved by reordering file data to -have most of it (i.e. except the start and the end) laying at "natural" -boundaries, thus it would be possible to directly map a big portion of -the file contents to the mm subsystem. +  boundary, it is currently possibly suboptimal to read or execute files +  from the filesystem.  It might be resolved by reordering file data to +  have most of it (i.e. except the start and the end) laying at "natural" +  boundaries, thus it would be possible to directly map a big portion of +  the file contents to the mm subsystem.  - Compression might be an useful feature, but memory is quite a -limiting factor in my eyes. +  limiting factor in my eyes.  - Where it is used? @@ -183,4 +190,5 @@ limiting factor in my eyes.  Have fun, +  Janos Farkas <[email protected]> diff --git a/Documentation/filesystems/squashfs.txt b/Documentation/filesystems/squashfs.rst index e5274f84dc56..df42106bae71 100644 --- a/Documentation/filesystems/squashfs.txt +++ b/Documentation/filesystems/squashfs.rst @@ -1,7 +1,11 @@ -SQUASHFS 4.0 FILESYSTEM +.. SPDX-License-Identifier: GPL-2.0 + +======================= +Squashfs 4.0 Filesystem  =======================  Squashfs is a compressed read-only filesystem for Linux. +  It uses zlib, lz4, lzo, or xz compression to compress files, inodes and  directories.  Inodes in the system are very small and all blocks are packed to  minimise data overhead. Block sizes greater than 4K are supported up to a @@ -15,31 +19,33 @@ needed.  Mailing list: [email protected]  Web site: www.squashfs.org -1. FILESYSTEM FEATURES +1. Filesystem Features  ----------------------  Squashfs filesystem features versus Cramfs: +============================== 	=========		==========  				Squashfs		Cramfs - -Max filesystem size:		2^64			256 MiB -Max file size:			~ 2 TiB			16 MiB -Max files:			unlimited		unlimited -Max directories:		unlimited		unlimited -Max entries per directory:	unlimited		unlimited -Max block size:			1 MiB			4 KiB -Metadata compression:		yes			no -Directory indexes:		yes			no -Sparse file support:		yes			no -Tail-end packing (fragments):	yes			no -Exportable (NFS etc.):		yes			no -Hard link support:		yes			no -"." and ".." in readdir:	yes			no -Real inode numbers:		yes			no -32-bit uids/gids:		yes			no -File creation time:		yes			no -Xattr support:			yes			no -ACL support:			no			no +============================== 	=========		========== +Max filesystem size		2^64			256 MiB +Max file size			~ 2 TiB			16 MiB +Max files			unlimited		unlimited +Max directories			unlimited		unlimited +Max entries per directory	unlimited		unlimited +Max block size			1 MiB			4 KiB +Metadata compression		yes			no +Directory indexes		yes			no +Sparse file support		yes			no +Tail-end packing (fragments)	yes			no +Exportable (NFS etc.)		yes			no +Hard link support		yes			no +"." and ".." in readdir		yes			no +Real inode numbers		yes			no +32-bit uids/gids		yes			no +File creation time		yes			no +Xattr support			yes			no +ACL support			no			no +============================== 	=========		==========  Squashfs compresses data, inodes and directories.  In addition, inode and  directory data are highly compacted, and packed on byte boundaries.  Each @@ -47,7 +53,7 @@ compressed inode is on average 8 bytes in length (the exact length varies on  file type, i.e. regular file, directory, symbolic link, and block/char device  inodes have different sizes). -2. USING SQUASHFS +2. Using Squashfs  -----------------  As squashfs is a read-only filesystem, the mksquashfs program must be used to @@ -58,11 +64,11 @@ obtained from this site also.  The squashfs-tools development tree is now located on kernel.org  	git://git.kernel.org/pub/scm/fs/squashfs/squashfs-tools.git -3. SQUASHFS FILESYSTEM DESIGN +3. Squashfs Filesystem Design  -----------------------------  A squashfs filesystem consists of a maximum of nine parts, packed together on a -byte alignment: +byte alignment::  	 ---------------  	|  superblock 	| @@ -229,15 +235,15 @@ location of the xattr list inside each inode, a 32-bit xattr id  is stored.  This xattr id is mapped into the location of the xattr  list using a second xattr id lookup table. -4. TODOS AND OUTSTANDING ISSUES +4. TODOs and Outstanding Issues  ------------------------------- -4.1 Todo list +4.1 TODO list  -------------  Implement ACL support. -4.2 Squashfs internal cache +4.2 Squashfs Internal Cache  ---------------------------  Blocks in Squashfs are compressed.  To avoid repeatedly decompressing diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.rst index ddf15b1b0d5a..290891c3fecb 100644 --- a/Documentation/filesystems/sysfs.txt +++ b/Documentation/filesystems/sysfs.rst @@ -1,32 +1,36 @@ +.. SPDX-License-Identifier: GPL-2.0 -sysfs - _The_ filesystem for exporting kernel objects.  +===================================================== +sysfs - _The_ filesystem for exporting kernel objects +=====================================================  Patrick Mochel	<[email protected]> +  Mike Murphy <[email protected]> -Revised:    16 August 2011 -Original:   10 January 2003 +:Revised:    16 August 2011 +:Original:   10 January 2003  What it is:  ~~~~~~~~~~~  sysfs is a ram-based filesystem initially based on ramfs. It provides -a means to export kernel data structures, their attributes, and the  -linkages between them to userspace.  +a means to export kernel data structures, their attributes, and the +linkages between them to userspace.  sysfs is tied inherently to the kobject infrastructure. Please read  Documentation/kobject.txt for more information concerning the kobject -interface.  +interface.  Using sysfs  ~~~~~~~~~~~  sysfs is always compiled in if CONFIG_SYSFS is defined. You can access -it by doing: +it by doing:: -    mount -t sysfs sysfs /sys  +    mount -t sysfs sysfs /sys  Directory Creation @@ -37,7 +41,7 @@ created for it in sysfs. That directory is created as a subdirectory  of the kobject's parent, expressing internal object hierarchies to  userspace. Top-level directories in sysfs represent the common  ancestors of object hierarchies; i.e. the subsystems the objects -belong to.  +belong to.  Sysfs internally stores a pointer to the kobject that implements a  directory in the kernfs_node object associated with the directory. In @@ -58,63 +62,63 @@ attributes.  Attributes should be ASCII text files, preferably with only one value  per file. It is noted that it may not be efficient to contain only one  value per file, so it is socially acceptable to express an array of -values of the same type.  +values of the same type.  Mixing types, expressing multiple lines of data, and doing fancy  formatting of data is heavily frowned upon. Doing these things may get -you publicly humiliated and your code rewritten without notice.  +you publicly humiliated and your code rewritten without notice. -An attribute definition is simply: +An attribute definition is simply:: -struct attribute { -        char                    * name; -        struct module		*owner; -        umode_t                 mode; -}; +    struct attribute { +	    char                    * name; +	    struct module		*owner; +	    umode_t                 mode; +    }; -int sysfs_create_file(struct kobject * kobj, const struct attribute * attr); -void sysfs_remove_file(struct kobject * kobj, const struct attribute * attr); +    int sysfs_create_file(struct kobject * kobj, const struct attribute * attr); +    void sysfs_remove_file(struct kobject * kobj, const struct attribute * attr);  A bare attribute contains no means to read or write the value of the  attribute. Subsystems are encouraged to define their own attribute  structure and wrapper functions for adding and removing attributes for -a specific object type.  +a specific object type. -For example, the driver model defines struct device_attribute like: +For example, the driver model defines struct device_attribute like:: -struct device_attribute { -	struct attribute	attr; -	ssize_t (*show)(struct device *dev, struct device_attribute *attr, -			char *buf); -	ssize_t (*store)(struct device *dev, struct device_attribute *attr, -			 const char *buf, size_t count); -}; +    struct device_attribute { +	    struct attribute	attr; +	    ssize_t (*show)(struct device *dev, struct device_attribute *attr, +			    char *buf); +	    ssize_t (*store)(struct device *dev, struct device_attribute *attr, +			    const char *buf, size_t count); +    }; -int device_create_file(struct device *, const struct device_attribute *); -void device_remove_file(struct device *, const struct device_attribute *); +    int device_create_file(struct device *, const struct device_attribute *); +    void device_remove_file(struct device *, const struct device_attribute *); -It also defines this helper for defining device attributes:  +It also defines this helper for defining device attributes:: -#define DEVICE_ATTR(_name, _mode, _show, _store) \ -struct device_attribute dev_attr_##_name = __ATTR(_name, _mode, _show, _store) +    #define DEVICE_ATTR(_name, _mode, _show, _store) \ +    struct device_attribute dev_attr_##_name = __ATTR(_name, _mode, _show, _store) -For example, declaring +For example, declaring:: -static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo); +    static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo); -is equivalent to doing: +is equivalent to doing:: -static struct device_attribute dev_attr_foo = { -	.attr = { -		.name = "foo", -		.mode = S_IWUSR | S_IRUGO, -	}, -	.show = show_foo, -	.store = store_foo, -}; +    static struct device_attribute dev_attr_foo = { +	    .attr = { +		    .name = "foo", +		    .mode = S_IWUSR | S_IRUGO, +	    }, +	    .show = show_foo, +	    .store = store_foo, +    };  Note as stated in include/linux/kernel.h "OTHER_WRITABLE?  Generally  considered a bad idea." so trying to set a sysfs file writable for @@ -127,15 +131,21 @@ readable. The above case could be shortened to:  static struct device_attribute dev_attr_foo = __ATTR_RW(foo);  the list of helpers available to define your wrapper function is: -__ATTR_RO(name): assumes default name_show and mode 0444 -__ATTR_WO(name): assumes a name_store only and is restricted to mode + +__ATTR_RO(name): +		 assumes default name_show and mode 0444 +__ATTR_WO(name): +		 assumes a name_store only and is restricted to mode                   0200 that is root write access only. -__ATTR_RO_MODE(name, mode): fore more restrictive RO access currently +__ATTR_RO_MODE(name, mode): +	         fore more restrictive RO access currently                   only use case is the EFI System Resource Table                   (see drivers/firmware/efi/esrt.c) -__ATTR_RW(name): assumes default name_show, name_store and setting +__ATTR_RW(name): +	         assumes default name_show, name_store and setting                   mode to 0644. -__ATTR_NULL: which sets the name to NULL and is used as end of list +__ATTR_NULL: +	         which sets the name to NULL and is used as end of list                   indicator (see: kernel/workqueue.c)  Subsystem-Specific Callbacks @@ -143,12 +153,12 @@ Subsystem-Specific Callbacks  When a subsystem defines a new attribute type, it must implement a  set of sysfs operations for forwarding read and write calls to the -show and store methods of the attribute owners.  +show and store methods of the attribute owners:: -struct sysfs_ops { -        ssize_t (*show)(struct kobject *, struct attribute *, char *); -        ssize_t (*store)(struct kobject *, struct attribute *, const char *, size_t); -}; +    struct sysfs_ops { +	    ssize_t (*show)(struct kobject *, struct attribute *, char *); +	    ssize_t (*store)(struct kobject *, struct attribute *, const char *, size_t); +    };  [ Subsystems should have already defined a struct kobj_type as a  descriptor for this type, which is where the sysfs_ops pointer is @@ -157,29 +167,29 @@ stored. See the kobject documentation for more information. ]  When a file is read or written, sysfs calls the appropriate method  for the type. The method then translates the generic struct kobject  and struct attribute pointers to the appropriate pointer types, and -calls the associated methods.  +calls the associated methods. -To illustrate: +To illustrate:: -#define to_dev(obj) container_of(obj, struct device, kobj) -#define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr) +    #define to_dev(obj) container_of(obj, struct device, kobj) +    #define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr) -static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr, -                             char *buf) -{ -        struct device_attribute *dev_attr = to_dev_attr(attr); -        struct device *dev = to_dev(kobj); -        ssize_t ret = -EIO; +    static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr, +				char *buf) +    { +	    struct device_attribute *dev_attr = to_dev_attr(attr); +	    struct device *dev = to_dev(kobj); +	    ssize_t ret = -EIO; -        if (dev_attr->show) -                ret = dev_attr->show(dev, dev_attr, buf); -        if (ret >= (ssize_t)PAGE_SIZE) { -                printk("dev_attr_show: %pS returned bad count\n", -                                dev_attr->show); -        } -        return ret; -} +	    if (dev_attr->show) +		    ret = dev_attr->show(dev, dev_attr, buf); +	    if (ret >= (ssize_t)PAGE_SIZE) { +		    printk("dev_attr_show: %pS returned bad count\n", +				    dev_attr->show); +	    } +	    return ret; +    } @@ -188,11 +198,11 @@ Reading/Writing Attribute Data  To read or write attributes, show() or store() methods must be  specified when declaring the attribute. The method types should be as -simple as those defined for device attributes: +simple as those defined for device attributes:: -ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf); -ssize_t (*store)(struct device *dev, struct device_attribute *attr, -                 const char *buf, size_t count); +    ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf); +    ssize_t (*store)(struct device *dev, struct device_attribute *attr, +		    const char *buf, size_t count);  IOW, they should take only an object, an attribute, and a buffer as parameters. @@ -200,11 +210,11 @@ IOW, they should take only an object, an attribute, and a buffer as parameters.  sysfs allocates a buffer of size (PAGE_SIZE) and passes it to the  method. Sysfs will call the method exactly once for each read or  write. This forces the following behavior on the method -implementations:  +implementations: -- On read(2), the show() method should fill the entire buffer.  +- On read(2), the show() method should fill the entire buffer.    Recall that an attribute should only be exporting one value, or an -  array of similar values, so this shouldn't be that expensive.  +  array of similar values, so this shouldn't be that expensive.    This allows userspace to do partial reads and forward seeks    arbitrarily over the entire file at will. If userspace seeks back to @@ -218,10 +228,10 @@ implementations:    When writing sysfs files, userspace processes should first read the    entire file, modify the values it wishes to change, then write the -  entire buffer back.  +  entire buffer back.    Attribute method implementations should operate on an identical -  buffer when reading and writing values.  +  buffer when reading and writing values.  Other notes: @@ -229,7 +239,7 @@ Other notes:    file position.  - The buffer will always be PAGE_SIZE bytes in length. On i386, this -  is 4096.  +  is 4096.  - show() methods should return the number of bytes printed into the    buffer. This is the return value of scnprintf(). @@ -246,31 +256,31 @@ Other notes:    through, be sure to return an error.  - The object passed to the methods will be pinned in memory via sysfs -  referencing counting its embedded object. However, the physical  -  entity (e.g. device) the object represents may not be present. Be  -  sure to have a way to check this, if necessary.  +  referencing counting its embedded object. However, the physical +  entity (e.g. device) the object represents may not be present. Be +  sure to have a way to check this, if necessary. -A very simple (and naive) implementation of a device attribute is: +A very simple (and naive) implementation of a device attribute is:: -static ssize_t show_name(struct device *dev, struct device_attribute *attr, -                         char *buf) -{ -	return scnprintf(buf, PAGE_SIZE, "%s\n", dev->name); -} +    static ssize_t show_name(struct device *dev, struct device_attribute *attr, +			    char *buf) +    { +	    return scnprintf(buf, PAGE_SIZE, "%s\n", dev->name); +    } -static ssize_t store_name(struct device *dev, struct device_attribute *attr, -                          const char *buf, size_t count) -{ -        snprintf(dev->name, sizeof(dev->name), "%.*s", -                 (int)min(count, sizeof(dev->name) - 1), buf); -	return count; -} +    static ssize_t store_name(struct device *dev, struct device_attribute *attr, +			    const char *buf, size_t count) +    { +	    snprintf(dev->name, sizeof(dev->name), "%.*s", +		    (int)min(count, sizeof(dev->name) - 1), buf); +	    return count; +    } -static DEVICE_ATTR(name, S_IRUGO, show_name, store_name); +    static DEVICE_ATTR(name, S_IRUGO, show_name, store_name); -(Note that the real implementation doesn't allow userspace to set the  +(Note that the real implementation doesn't allow userspace to set the  name for a device.) @@ -278,25 +288,25 @@ Top Level Directory Layout  ~~~~~~~~~~~~~~~~~~~~~~~~~~  The sysfs directory arrangement exposes the relationship of kernel -data structures.  +data structures. -The top level sysfs directory looks like: +The top level sysfs directory looks like:: -block/ -bus/ -class/ -dev/ -devices/ -firmware/ -net/ -fs/ +    block/ +    bus/ +    class/ +    dev/ +    devices/ +    firmware/ +    net/ +    fs/  devices/ contains a filesystem representation of the device tree. It maps  directly to the internal kernel device tree, which is a hierarchy of -struct device.  +struct device.  bus/ contains flat directory layout of the various bus types in the -kernel. Each bus's directory contains two subdirectories: +kernel. Each bus's directory contains two subdirectories::  	devices/  	drivers/ @@ -331,71 +341,71 @@ Current Interfaces  The following interface layers currently exist in sysfs: -- devices (include/linux/device.h) ----------------------------------- -Structure: +devices (include/linux/device.h) +-------------------------------- +Structure:: -struct device_attribute { -	struct attribute	attr; -	ssize_t (*show)(struct device *dev, struct device_attribute *attr, -			char *buf); -	ssize_t (*store)(struct device *dev, struct device_attribute *attr, -			 const char *buf, size_t count); -}; +    struct device_attribute { +	    struct attribute	attr; +	    ssize_t (*show)(struct device *dev, struct device_attribute *attr, +			    char *buf); +	    ssize_t (*store)(struct device *dev, struct device_attribute *attr, +			    const char *buf, size_t count); +    }; -Declaring: +Declaring:: -DEVICE_ATTR(_name, _mode, _show, _store); +    DEVICE_ATTR(_name, _mode, _show, _store); -Creation/Removal: +Creation/Removal:: -int device_create_file(struct device *dev, const struct device_attribute * attr); -void device_remove_file(struct device *dev, const struct device_attribute * attr); +    int device_create_file(struct device *dev, const struct device_attribute * attr); +    void device_remove_file(struct device *dev, const struct device_attribute * attr); -- bus drivers (include/linux/device.h) --------------------------------------- -Structure: +bus drivers (include/linux/device.h) +------------------------------------ +Structure:: -struct bus_attribute { -        struct attribute        attr; -        ssize_t (*show)(struct bus_type *, char * buf); -        ssize_t (*store)(struct bus_type *, const char * buf, size_t count); -}; +    struct bus_attribute { +	    struct attribute        attr; +	    ssize_t (*show)(struct bus_type *, char * buf); +	    ssize_t (*store)(struct bus_type *, const char * buf, size_t count); +    }; -Declaring: +Declaring:: -static BUS_ATTR_RW(name); -static BUS_ATTR_RO(name); -static BUS_ATTR_WO(name); +    static BUS_ATTR_RW(name); +    static BUS_ATTR_RO(name); +    static BUS_ATTR_WO(name); -Creation/Removal: +Creation/Removal:: -int bus_create_file(struct bus_type *, struct bus_attribute *); -void bus_remove_file(struct bus_type *, struct bus_attribute *); +    int bus_create_file(struct bus_type *, struct bus_attribute *); +    void bus_remove_file(struct bus_type *, struct bus_attribute *); -- device drivers (include/linux/device.h) ------------------------------------------ +device drivers (include/linux/device.h) +--------------------------------------- -Structure: +Structure:: -struct driver_attribute { -        struct attribute        attr; -        ssize_t (*show)(struct device_driver *, char * buf); -        ssize_t (*store)(struct device_driver *, const char * buf, -                         size_t count); -}; +    struct driver_attribute { +	    struct attribute        attr; +	    ssize_t (*show)(struct device_driver *, char * buf); +	    ssize_t (*store)(struct device_driver *, const char * buf, +			    size_t count); +    }; -Declaring: +Declaring:: -DRIVER_ATTR_RO(_name) -DRIVER_ATTR_RW(_name) +    DRIVER_ATTR_RO(_name) +    DRIVER_ATTR_RW(_name) -Creation/Removal: +Creation/Removal:: -int driver_create_file(struct device_driver *, const struct driver_attribute *); -void driver_remove_file(struct device_driver *, const struct driver_attribute *); +    int driver_create_file(struct device_driver *, const struct driver_attribute *); +    void driver_remove_file(struct device_driver *, const struct driver_attribute *);  Documentation diff --git a/Documentation/filesystems/sysv-fs.txt b/Documentation/filesystems/sysv-fs.rst index 253b50d1328e..89e40911ad7c 100644 --- a/Documentation/filesystems/sysv-fs.txt +++ b/Documentation/filesystems/sysv-fs.rst @@ -1,25 +1,40 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================== +SystemV Filesystem +================== +  It implements all of    - Xenix FS,    - SystemV/386 FS,    - Coherent FS.  To install: +  * Answer the 'System V and Coherent filesystem support' question with 'y'    when configuring the kernel. -* To mount a disk or a partition, use +* To mount a disk or a partition, use:: +      mount [-r] -t sysv device mountpoint -  The file system type names + +  The file system type names:: +                 -t sysv                 -t xenix                 -t coherent +    may be used interchangeably, but the last two will eventually disappear.  Bugs in the present implementation: +  - Coherent FS: +    - The "free list interleave" n:m is currently ignored.    - Only file systems with no filesystem name and no pack name are recognized. -  (See Coherent "man mkfs" for a description of these features.) +    (See Coherent "man mkfs" for a description of these features.) +  - SystemV Release 2 FS: +    The superblock is only searched in the blocks 9, 15, 18, which    corresponds to the beginning of track 1 on floppy disks. No support    for this FS on hard disk yet. @@ -28,12 +43,14 @@ Bugs in the present implementation:  These filesystems are rather similar. Here is a comparison with Minix FS:  * Linux fdisk reports on partitions +    - Minix FS     0x81 Linux/Minix    - Xenix FS     ??    - SystemV FS   ??    - Coherent FS  0x08 AIX bootable  * Size of a block or zone (data allocation unit on disk) +    - Minix FS     1024    - Xenix FS     1024 (also 512 ??)    - SystemV FS   1024 (also 512 and 2048) @@ -45,37 +62,51 @@ These filesystems are rather similar. Here is a comparison with Minix FS:    all the block numbers (including the super block) are offset by one track.  * Byte ordering of "short" (16 bit entities) on disk: +    - Minix FS     little endian  0 1    - Xenix FS     little endian  0 1    - SystemV FS   little endian  0 1    - Coherent FS  little endian  0 1 +    Of course, this affects only the file system, not the data of files on it!  * Byte ordering of "long" (32 bit entities) on disk: +    - Minix FS     little endian  0 1 2 3    - Xenix FS     little endian  0 1 2 3    - SystemV FS   little endian  0 1 2 3    - Coherent FS  PDP-11         2 3 0 1 +    Of course, this affects only the file system, not the data of files on it!  * Inode on disk: "short", 0 means non-existent, the root dir ino is: -  - Minix FS                            1 -  - Xenix FS, SystemV FS, Coherent FS   2 + +  =================================  == +  Minix FS                            1 +  Xenix FS, SystemV FS, Coherent FS   2 +  =================================  ==  * Maximum number of hard links to a file: -  - Minix FS     250 -  - Xenix FS     ?? -  - SystemV FS   ?? -  - Coherent FS  >=10000 + +  ===========  ========= +  Minix FS     250 +  Xenix FS     ?? +  SystemV FS   ?? +  Coherent FS  >=10000 +  ===========  =========  * Free inode management: -  - Minix FS                             a bitmap + +  - Minix FS +      a bitmap    - Xenix FS, SystemV FS, Coherent FS        There is a cache of a certain number of free inodes in the super-block.        When it is exhausted, new free inodes are found using a linear search.  * Free block management: -  - Minix FS                             a bitmap + +  - Minix FS +      a bitmap    - Xenix FS, SystemV FS, Coherent FS        Free blocks are organized in a "free list". Maybe a misleading term,        since it is not true that every free block contains a pointer to @@ -86,13 +117,18 @@ These filesystems are rather similar. Here is a comparison with Minix FS:        0 on Xenix FS and SystemV FS, with a block zeroed out on Coherent FS.  * Super-block location: -  - Minix FS     block 1 = bytes 1024..2047 -  - Xenix FS     block 1 = bytes 1024..2047 -  - SystemV FS   bytes 512..1023 -  - Coherent FS  block 1 = bytes 512..1023 + +  ===========  ========================== +  Minix FS     block 1 = bytes 1024..2047 +  Xenix FS     block 1 = bytes 1024..2047 +  SystemV FS   bytes 512..1023 +  Coherent FS  block 1 = bytes 512..1023 +  ===========  ==========================  * Super-block layout: -  - Minix FS + +  - Minix FS:: +                      unsigned short s_ninodes;                      unsigned short s_nzones;                      unsigned short s_imap_blocks; @@ -101,7 +137,9 @@ These filesystems are rather similar. Here is a comparison with Minix FS:                      unsigned short s_log_zone_size;                      unsigned long s_max_size;                      unsigned short s_magic; -  - Xenix FS, SystemV FS, Coherent FS + +  - Xenix FS, SystemV FS, Coherent FS:: +                      unsigned short s_firstdatazone;                      unsigned long  s_nzones;                      unsigned short s_fzone_count; @@ -120,23 +158,33 @@ These filesystems are rather similar. Here is a comparison with Minix FS:                      unsigned short s_interleave_m,s_interleave_n; -- Coherent FS only                      char           s_fname[6];                      char           s_fpack[6]; +      then they differ considerably: -        Xenix FS + +        Xenix FS:: +                      char           s_clean;                      char           s_fill[371];                      long           s_magic;                      long           s_type; -        SystemV FS + +        SystemV FS:: +                      long           s_fill[12 or 14];                      long           s_state;                      long           s_magic;                      long           s_type; -        Coherent FS + +        Coherent FS:: +                      unsigned long  s_unique; +      Note that Coherent FS has no magic.  * Inode layout: -  - Minix FS + +  - Minix FS:: +                      unsigned short i_mode;                      unsigned short i_uid;                      unsigned long  i_size; @@ -144,7 +192,9 @@ These filesystems are rather similar. Here is a comparison with Minix FS:                      unsigned char  i_gid;                      unsigned char  i_nlinks;                      unsigned short i_zone[7+1+1]; -  - Xenix FS, SystemV FS, Coherent FS + +  - Xenix FS, SystemV FS, Coherent FS:: +                      unsigned short i_mode;                      unsigned short i_nlink;                      unsigned short i_uid; @@ -155,38 +205,55 @@ These filesystems are rather similar. Here is a comparison with Minix FS:                      unsigned long  i_mtime;                      unsigned long  i_ctime; +  * Regular file data blocks are organized as -  - Minix FS -               7 direct blocks -               1 indirect block (pointers to blocks) -               1 double-indirect block (pointer to pointers to blocks) -  - Xenix FS, SystemV FS, Coherent FS -              10 direct blocks -               1 indirect block (pointers to blocks) -               1 double-indirect block (pointer to pointers to blocks) -               1 triple-indirect block (pointer to pointers to pointers to blocks) -* Inode size, inodes per block -  - Minix FS        32   32 -  - Xenix FS        64   16 -  - SystemV FS      64   16 -  - Coherent FS     64    8 +  - Minix FS: + +             - 7 direct blocks +	     - 1 indirect block (pointers to blocks) +             - 1 double-indirect block (pointer to pointers to blocks) + +  - Xenix FS, SystemV FS, Coherent FS: + +             - 10 direct blocks +             -  1 indirect block (pointers to blocks) +             -  1 double-indirect block (pointer to pointers to blocks) +             -  1 triple-indirect block (pointer to pointers to pointers to blocks) + + +  ===========  ==========   ================ +               Inode size   inodes per block +  ===========  ==========   ================ +  Minix FS        32        32 +  Xenix FS        64        16 +  SystemV FS      64        16 +  Coherent FS     64        8 +  ===========  ==========   ================  * Directory entry on disk -  - Minix FS + +  - Minix FS:: +                      unsigned short inode;                      char name[14/30]; -  - Xenix FS, SystemV FS, Coherent FS + +  - Xenix FS, SystemV FS, Coherent FS:: +                      unsigned short inode;                      char name[14]; -* Dir entry size, dir entries per block -  - Minix FS     16/32    64/32 -  - Xenix FS     16       64 -  - SystemV FS   16       64 -  - Coherent FS  16       32 +  ===========    ==============    ===================== +                 Dir entry size    dir entries per block +  ===========    ==============    ===================== +  Minix FS       16/32             64/32 +  Xenix FS       16                64 +  SystemV FS     16                64 +  Coherent FS    16                32 +  ===========    ==============    =====================  * How to implement symbolic links such that the host fsck doesn't scream: +    - Minix FS     normal    - Xenix FS     kludge: as regular files with  chmod 1000    - SystemV FS   ?? diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.rst index 5ecbc03e6b2f..4e95929301a5 100644 --- a/Documentation/filesystems/tmpfs.txt +++ b/Documentation/filesystems/tmpfs.rst @@ -1,3 +1,9 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===== +Tmpfs +===== +  Tmpfs is a file system which keeps all files in virtual memory. @@ -14,7 +20,7 @@ If you compare it to ramfs (which was the template to create tmpfs)  you gain swapping and limit checking. Another similar thing is the RAM  disk (/dev/ram*), which simulates a fixed size hard disk in physical  RAM, where you have to create an ordinary filesystem on top. Ramdisks -cannot swap and you do not have the possibility to resize them.  +cannot swap and you do not have the possibility to resize them.  Since tmpfs lives completely in the page cache and on swap, all tmpfs  pages will be shown as "Shmem" in /proc/meminfo and "Shared" in @@ -26,7 +32,7 @@ tmpfs has the following uses:  1) There is always a kernel internal mount which you will not see at     all. This is used for shared anonymous mappings and SYSV shared -   memory.  +   memory.     This mount does not depend on CONFIG_TMPFS. If CONFIG_TMPFS is not     set, the user visible part of tmpfs is not build. But the internal @@ -34,7 +40,7 @@ tmpfs has the following uses:  2) glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for     POSIX shared memory (shm_open, shm_unlink). Adding the following -   line to /etc/fstab should take care of this: +   line to /etc/fstab should take care of this::  	tmpfs	/dev/shm	tmpfs	defaults	0 0 @@ -56,15 +62,17 @@ tmpfs has the following uses:  tmpfs has three mount options for sizing: -size:      The limit of allocated bytes for this tmpfs instance. The  +=========  ============================================================ +size       The limit of allocated bytes for this tmpfs instance. The             default is half of your physical RAM without swap. If you             oversize your tmpfs instances the machine will deadlock             since the OOM handler will not be able to free that memory. -nr_blocks: The same as size, but in blocks of PAGE_SIZE. -nr_inodes: The maximum number of inodes for this instance. The default +nr_blocks  The same as size, but in blocks of PAGE_SIZE. +nr_inodes  The maximum number of inodes for this instance. The default             is half of the number of your physical RAM pages, or (on a             machine with highmem) the number of lowmem RAM pages,             whichever is the lower. +=========  ============================================================  These parameters accept a suffix k, m or g for kilo, mega and giga and  can be changed on remount.  The size parameter also accepts a suffix % @@ -82,6 +90,7 @@ tmpfs has a mount option to set the NUMA memory allocation policy for  all files in that instance (if CONFIG_NUMA is enabled) - which can be  adjusted on the fly via 'mount -o remount ...' +======================== ==============================================  mpol=default             use the process allocation policy                           (see set_mempolicy(2))  mpol=prefer:Node         prefers to allocate memory from the given Node @@ -89,6 +98,7 @@ mpol=bind:NodeList       allocates memory only from nodes in NodeList  mpol=interleave          prefers to allocate from each node in turn  mpol=interleave:NodeList allocates from each node of NodeList in turn  mpol=local		 prefers to allocate memory from the local node +======================== ==============================================  NodeList format is a comma-separated list of decimal numbers and ranges,  a range being two hyphen-separated decimal numbers, the smallest and @@ -98,9 +108,9 @@ A memory policy with a valid NodeList will be saved, as specified, for  use at file creation time.  When a task allocates a file in the file  system, the mount option memory policy will be applied with a NodeList,  if any, modified by the calling task's cpuset constraints -[See Documentation/admin-guide/cgroup-v1/cpusets.rst] and any optional flags, listed -below.  If the resulting NodeLists is the empty set, the effective memory -policy for the file will revert to "default" policy. +[See Documentation/admin-guide/cgroup-v1/cpusets.rst] and any optional flags, +listed below.  If the resulting NodeLists is the empty set, the effective +memory policy for the file will revert to "default" policy.  NUMA memory allocation policies have optional flags that can be used in  conjunction with their modes.  These optional flags can be specified @@ -109,6 +119,8 @@ See Documentation/admin-guide/mm/numa_memory_policy.rst for a list of  all available memory allocation policy mode flags and their effect on  memory policy. +:: +  	=static		is equivalent to	MPOL_F_STATIC_NODES  	=relative	is equivalent to	MPOL_F_RELATIVE_NODES @@ -128,9 +140,11 @@ on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'.  To specify the initial root directory you can use the following mount  options: -mode:	The permissions as an octal number -uid:	The user id  -gid:	The group id +====	================================== +mode	The permissions as an octal number +uid	The user id +gid	The group id +====	==================================  These options do not have any effect on remount. You can change these  parameters with chmod(1), chown(1) and chgrp(1) on a mounted filesystem. @@ -141,9 +155,9 @@ will give you tmpfs instance on /mytmpfs which can allocate 10GB  RAM/SWAP in 10240 inodes and it is only accessible by root. -Author: +:Author:     Christoph Rohland <[email protected]>, 1.12.01 -Updated: +:Updated:     Hugh Dickins, 4 June 2007 -Updated: +:Updated:     KOSAKI Motohiro, 16 Mar 2010 diff --git a/Documentation/filesystems/ubifs-authentication.rst b/Documentation/filesystems/ubifs-authentication.rst index 6a9584f6ff46..16efd729bf7c 100644 --- a/Documentation/filesystems/ubifs-authentication.rst +++ b/Documentation/filesystems/ubifs-authentication.rst @@ -1,3 +1,5 @@ +.. SPDX-License-Identifier: GPL-2.0 +  :orphan:  .. UBIFS Authentication @@ -92,11 +94,11 @@ UBIFS Index & Tree Node Cache  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  Basic on-flash UBIFS entities are called *nodes*. UBIFS knows different types -of nodes. Eg. data nodes (`struct ubifs_data_node`) which store chunks of file -contents or inode nodes (`struct ubifs_ino_node`) which represent VFS inodes. -Almost all types of nodes share a common header (`ubifs_ch`) containing basic +of nodes. Eg. data nodes (``struct ubifs_data_node``) which store chunks of file +contents or inode nodes (``struct ubifs_ino_node``) which represent VFS inodes. +Almost all types of nodes share a common header (``ubifs_ch``) containing basic  information like node type, node length, a sequence number, etc. (see -`fs/ubifs/ubifs-media.h`in kernel source). Exceptions are entries of the LPT +``fs/ubifs/ubifs-media.h`` in kernel source). Exceptions are entries of the LPT  and some less important node types like padding nodes which are used to pad  unusable content at the end of LEBs. diff --git a/Documentation/filesystems/ubifs.txt b/Documentation/filesystems/ubifs.rst index acc80442a3bb..e6ee99762534 100644 --- a/Documentation/filesystems/ubifs.txt +++ b/Documentation/filesystems/ubifs.rst @@ -1,5 +1,11 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============== +UBI File System +=============== +  Introduction -============= +============  UBIFS file-system stands for UBI File System. UBI stands for "Unsorted  Block Images". UBIFS is a flash file system, which means it is designed @@ -79,6 +85,7 @@ Mount options  (*) == default. +====================	=======================================================  bulk_read		read more in one go to take advantage of flash  			media that read faster sequentially  no_bulk_read (*)	do not bulk-read @@ -98,6 +105,7 @@ auth_key=		specify the key used for authenticating the filesystem.  auth_hash_name=		The hash algorithm used for authentication. Used for  			both hashing and for creating HMACs. Typical values  			include "sha256" or "sha512" +====================	=======================================================  Quick usage instructions @@ -107,12 +115,14 @@ The UBI volume to mount is specified using "ubiX_Y" or "ubiX:NAME" syntax,  where "X" is UBI device number, "Y" is UBI volume number, and "NAME" is  UBI volume name. -Mount volume 0 on UBI device 0 to /mnt/ubifs: -$ mount -t ubifs ubi0_0 /mnt/ubifs +Mount volume 0 on UBI device 0 to /mnt/ubifs:: + +    $ mount -t ubifs ubi0_0 /mnt/ubifs  Mount "rootfs" volume of UBI device 0 to /mnt/ubifs ("rootfs" is volume -name): -$ mount -t ubifs ubi0:rootfs /mnt/ubifs +name):: + +    $ mount -t ubifs ubi0:rootfs /mnt/ubifs  The following is an example of the kernel boot arguments to attach mtd0  to UBI and mount volume "rootfs": @@ -122,5 +132,6 @@ References  ==========  UBIFS documentation and FAQ/HOWTO at the MTD web site: -http://www.linux-mtd.infradead.org/doc/ubifs.html -http://www.linux-mtd.infradead.org/faq/ubifs.html + +- http://www.linux-mtd.infradead.org/doc/ubifs.html +- http://www.linux-mtd.infradead.org/faq/ubifs.html diff --git a/Documentation/filesystems/udf.txt b/Documentation/filesystems/udf.rst index e2f2faf32f18..d9badbf285b2 100644 --- a/Documentation/filesystems/udf.txt +++ b/Documentation/filesystems/udf.rst @@ -1,6 +1,8 @@ -* -* Documentation/filesystems/udf.txt -* +.. SPDX-License-Identifier: GPL-2.0 + +=============== +UDF file system +===============  If you encounter problems with reading UDF discs using this driver,  please report them according to MAINTAINERS file. @@ -18,8 +20,10 @@ performance due to very poor read-modify-write support supplied internally  by drive firmware.  ------------------------------------------------------------------------------- +  The following mount options are supported: +	===========	======================================  	gid=		Set the default group.  	umask=		Set the default umask.  	mode=		Set the default file permissions. @@ -34,6 +38,7 @@ The following mount options are supported:  	longad		Use long ad's (default)  	nostrict	Unset strict conformance  	iocharset=	Set the NLS character set +	===========	======================================  The uid= and gid= options need a bit more explaining.  They will accept a  decimal numeric value and all inodes on that mount will then appear as @@ -47,13 +52,17 @@ the interactive user will always see the files on the disk as belonging to him.  The remaining are for debugging and disaster recovery: -	novrs		Skip volume sequence recognition  +	=====		================================ +	novrs		Skip volume sequence recognition +	=====		================================  The following expect a offset from 0. +	==========	=================================================  	session=	Set the CDROM session (default= last session)  	anchor=		Override standard anchor location. (default= 256)  	lastblock=	Set the last block of the filesystem/ +	==========	=================================================  ------------------------------------------------------------------------------- @@ -62,5 +71,5 @@ For the latest version and toolset see:  	https://github.com/pali/udftools  Documentation on UDF and ECMA 167 is available FREE from: -	http://www.osta.org/ -	http://www.ecma-international.org/ +	- http://www.osta.org/ +	- http://www.ecma-international.org/ diff --git a/Documentation/filesystems/virtiofs.rst b/Documentation/filesystems/virtiofs.rst index 4f338e3cb3f7..e06e4951cb39 100644 --- a/Documentation/filesystems/virtiofs.rst +++ b/Documentation/filesystems/virtiofs.rst @@ -1,5 +1,7 @@  .. SPDX-License-Identifier: GPL-2.0 +.. _virtiofs_index: +  ===================================================  virtiofs: virtio-fs host<->guest shared file system  =================================================== diff --git a/Documentation/filesystems/zonefs.txt b/Documentation/filesystems/zonefs.rst index d54fa98ac158..71d845c6a700 100644 --- a/Documentation/filesystems/zonefs.txt +++ b/Documentation/filesystems/zonefs.rst @@ -1,4 +1,8 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================================================  ZoneFS - Zone filesystem for Zoned block devices +================================================  Introduction  ============ @@ -29,6 +33,7 @@ Zoned block devices  Zoned storage devices belong to a class of storage devices with an address  space that is divided into zones. A zone is a group of consecutive LBAs and all  zones are contiguous (there are no LBA gaps). Zones may have different types. +  * Conventional zones: there are no access constraints to LBAs belonging to    conventional zones. Any read or write access can be executed, similarly to a    regular block device. @@ -158,6 +163,7 @@ Format options  --------------  Several optional features of zonefs can be enabled at format time. +  * Conventional zone aggregation: ranges of contiguous conventional zones can be    aggregated into a single larger file instead of the default one file per zone.  * File ownership: The owner UID and GID of zone files is by default 0 (root) @@ -249,7 +255,7 @@ permissions.  Further action taken by zonefs I/O error recovery can be controlled by the user  with the "errors=xxx" mount option. The table below summarizes the result of  zonefs I/O error processing depending on the mount option and on the zone -conditions. +conditions::      +--------------+-----------+-----------------------------------------+      |              |           |            Post error state             | @@ -258,11 +264,11 @@ conditions.      |    option    | condition | size     read    write    read    write |      +--------------+-----------+-----------------------------------------+      |              | good      | fixed    yes     no       yes     yes   | -    | remount-ro   | read-only | fixed    yes     no       yes     no    | +    | remount-ro   | read-only | as is    yes     no       yes     no    |      | (default)    | offline   |   0      no      no       no      no    |      +--------------+-----------+-----------------------------------------+      |              | good      | fixed    yes     no       yes     yes   | -    | zone-ro      | read-only | fixed    yes     no       yes     no    | +    | zone-ro      | read-only | as is    yes     no       yes     no    |      |              | offline   |   0      no      no       no      no    |      +--------------+-----------+-----------------------------------------+      |              | good      |   0      no      no       yes     yes   | @@ -270,11 +276,12 @@ conditions.      |              | offline   |   0      no      no       no      no    |      +--------------+-----------+-----------------------------------------+      |              | good      | fixed    yes     yes      yes     yes   | -    | repair       | read-only | fixed    yes     no       yes     no    | +    | repair       | read-only | as is    yes     no       yes     no    |      |              | offline   |   0      no      no       no      no    |      +--------------+-----------+-----------------------------------------+  Further notes: +  * The "errors=remount-ro" mount option is the default behavior of zonefs I/O    error processing if no errors mount option is specified.  * With the "errors=remount-ro" mount option, the change of the file access @@ -302,13 +309,22 @@ Mount options  zonefs define the "errors=<behavior>" mount option to allow the user to specify  zonefs behavior in response to I/O errors, inode size inconsistencies or zone  condition changes. The defined behaviors are as follow: +  * remount-ro (default)  * zone-ro  * zone-offline  * repair -The I/O error actions defined for each behavior are detailed in the previous -section. +The run-time I/O error actions defined for each behavior are detailed in the +previous section. Mount time I/O errors will cause the mount operation to fail. +The handling of read-only zones also differs between mount-time and run-time. +If a read-only zone is found at mount time, the zone is always treated in the +same manner as offline zones, that is, all accesses are disabled and the zone +file size set to 0. This is necessary as the write pointer of read-only zones +is defined as invalib by the ZBC and ZAC standards, making it impossible to +discover the amount of data that has been written to the zone. In the case of a +read-only zone discovered at run-time, as indicated in the previous section. +the size of the zone file is left unchanged from its last updated value.  Zonefs User Space Tools  ======================= @@ -325,78 +341,78 @@ Examples  --------  The following formats a 15TB host-managed SMR HDD with 256 MB zones -with the conventional zones aggregation feature enabled. +with the conventional zones aggregation feature enabled:: -# mkzonefs -o aggr_cnv /dev/sdX -# mount -t zonefs /dev/sdX /mnt -# ls -l /mnt/ -total 0 -dr-xr-xr-x 2 root root     1 Nov 25 13:23 cnv -dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq +    # mkzonefs -o aggr_cnv /dev/sdX +    # mount -t zonefs /dev/sdX /mnt +    # ls -l /mnt/ +    total 0 +    dr-xr-xr-x 2 root root     1 Nov 25 13:23 cnv +    dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq  The size of the zone files sub-directories indicate the number of files  existing for each type of zones. In this example, there is only one  conventional zone file (all conventional zones are aggregated under a single -file). +file):: -# ls -l /mnt/cnv -total 137101312 --rw-r----- 1 root root 140391743488 Nov 25 13:23 0 +    # ls -l /mnt/cnv +    total 137101312 +    -rw-r----- 1 root root 140391743488 Nov 25 13:23 0 -This aggregated conventional zone file can be used as a regular file. +This aggregated conventional zone file can be used as a regular file:: -# mkfs.ext4 /mnt/cnv/0 -# mount -o loop /mnt/cnv/0 /data +    # mkfs.ext4 /mnt/cnv/0 +    # mount -o loop /mnt/cnv/0 /data  The "seq" sub-directory grouping files for sequential write zones has in this -example 55356 zones. +example 55356 zones:: -# ls -lv /mnt/seq -total 14511243264 --rw-r----- 1 root root 0 Nov 25 13:23 0 --rw-r----- 1 root root 0 Nov 25 13:23 1 --rw-r----- 1 root root 0 Nov 25 13:23 2 -... --rw-r----- 1 root root 0 Nov 25 13:23 55354 --rw-r----- 1 root root 0 Nov 25 13:23 55355 +    # ls -lv /mnt/seq +    total 14511243264 +    -rw-r----- 1 root root 0 Nov 25 13:23 0 +    -rw-r----- 1 root root 0 Nov 25 13:23 1 +    -rw-r----- 1 root root 0 Nov 25 13:23 2 +    ... +    -rw-r----- 1 root root 0 Nov 25 13:23 55354 +    -rw-r----- 1 root root 0 Nov 25 13:23 55355  For sequential write zone files, the file size changes as data is appended at -the end of the file, similarly to any regular file system. +the end of the file, similarly to any regular file system:: -# dd if=/dev/zero of=/mnt/seq/0 bs=4096 count=1 conv=notrunc oflag=direct -1+0 records in -1+0 records out -4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00044121 s, 9.3 MB/s +    # dd if=/dev/zero of=/mnt/seq/0 bs=4096 count=1 conv=notrunc oflag=direct +    1+0 records in +    1+0 records out +    4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00044121 s, 9.3 MB/s -# ls -l /mnt/seq/0 --rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0 +    # ls -l /mnt/seq/0 +    -rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0  The written file can be truncated to the zone size, preventing any further -write operation. +write operation:: -# truncate -s 268435456 /mnt/seq/0 -# ls -l /mnt/seq/0 --rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0 +    # truncate -s 268435456 /mnt/seq/0 +    # ls -l /mnt/seq/0 +    -rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0  Truncation to 0 size allows freeing the file zone storage space and restart -append-writes to the file. +append-writes to the file:: -# truncate -s 0 /mnt/seq/0 -# ls -l /mnt/seq/0 --rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0 +    # truncate -s 0 /mnt/seq/0 +    # ls -l /mnt/seq/0 +    -rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0  Since files are statically mapped to zones on the disk, the number of blocks of -a file as reported by stat() and fstat() indicates the size of the file zone. - -# stat /mnt/seq/0 -  File: /mnt/seq/0 -  Size: 0         	Blocks: 524288     IO Block: 4096   regular empty file -Device: 870h/2160d	Inode: 50431       Links: 1 -Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/    root) -Access: 2019-11-25 13:23:57.048971997 +0900 -Modify: 2019-11-25 13:52:25.553805765 +0900 -Change: 2019-11-25 13:52:25.553805765 +0900 - Birth: - +a file as reported by stat() and fstat() indicates the size of the file zone:: + +    # stat /mnt/seq/0 +    File: /mnt/seq/0 +    Size: 0         	Blocks: 524288     IO Block: 4096   regular empty file +    Device: 870h/2160d	Inode: 50431       Links: 1 +    Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/    root) +    Access: 2019-11-25 13:23:57.048971997 +0900 +    Modify: 2019-11-25 13:52:25.553805765 +0900 +    Change: 2019-11-25 13:52:25.553805765 +0900 +    Birth: -  The number of blocks of the file ("Blocks") in units of 512B blocks gives the  maximum file size of 524288 * 512 B = 256 MB, corresponding to the device zone |