4 files changed, 244 insertions, 9 deletions
diff --git a/Documentation/core-api/folio_queue.rst b/Documentation/core-api/folio_queue.rst
new file mode 100644
index 000000000000..1fe7a9bc4b8d
--- /dev/null
+++ b/Documentation/core-api/folio_queue.rst
@@ -0,0 +1,212 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+===========
+Folio Queue
+===========
+
+:Author: David Howells <[email protected]>
+
+.. Contents:
+
+ * Overview
+ * Initialisation
+ * Adding and removing folios
+ * Querying information about a folio
+ * Querying information about a folio_queue
+ * Folio queue iteration
+ * Folio marks
+ * Lockless simultaneous production/consumption issues
+
+
+Overview
+========
+
+The folio_queue struct forms a single segment in a segmented list of folios
+that can be used to form an I/O buffer.  As such, the list can be iterated over
+using the ITER_FOLIOQ iov_iter type.
+
+The publicly accessible members of the structure are::
+
+	struct folio_queue {
+		struct folio_queue *next;
+		struct folio_queue *prev;
+		...
+	};
+
+A pair of pointers are provided, ``next`` and ``prev``, that point to the
+segments on either side of the segment being accessed.  Whilst this is a
+doubly-linked list, it is intentionally not a circular list; the outward
+sibling pointers in terminal segments should be NULL.
+
+Each segment in the list also stores:
+
+ * an ordered sequence of folio pointers,
+ * the size of each folio and
+ * three 1-bit marks per folio,
+
+but hese should not be accessed directly as the underlying data structure may
+change, but rather the access functions outlined below should be used.
+
+The facility can be made accessible by::
+
+	#include <linux/folio_queue.h>
+
+and to use the iterator::
+
+	#include <linux/uio.h>
+
+
+Initialisation
+==============
+
+A segment should be initialised by calling::
+
+	void folioq_init(struct folio_queue *folioq);
+
+with a pointer to the segment to be initialised.  Note that this will not
+necessarily initialise all the folio pointers, so care must be taken to check
+the number of folios added.
+
+
+Adding and removing folios
+==========================
+
+Folios can be set in the next unused slot in a segment struct by calling one
+of::
+
+	unsigned int folioq_append(struct folio_queue *folioq,
+				   struct folio *folio);
+
+	unsigned int folioq_append_mark(struct folio_queue *folioq,
+					struct folio *folio);
+
+Both functions update the stored folio count, store the folio and note its
+size.  The second function also sets the first mark for the folio added.  Both
+functions return the number of the slot used.  [!] Note that no attempt is made
+to check that the capacity wasn't overrun and the list will not be extended
+automatically.
+
+A folio can be excised by calling::
+
+	void folioq_clear(struct folio_queue *folioq, unsigned int slot);
+
+This clears the slot in the array and also clears all the marks for that folio,
+but doesn't change the folio count - so future accesses of that slot must check
+if the slot is occupied.
+
+
+Querying information about a folio
+==================================
+
+Information about the folio in a particular slot may be queried by the
+following function::
+
+	struct folio *folioq_folio(const struct folio_queue *folioq,
+				   unsigned int slot);
+
+If a folio has not yet been set in that slot, this may yield an undefined
+pointer.  The size of the folio in a slot may be queried with either of::
+
+	unsigned int folioq_folio_order(const struct folio_queue *folioq,
+					unsigned int slot);
+
+	size_t folioq_folio_size(const struct folio_queue *folioq,
+				 unsigned int slot);
+
+The first function returns the size as an order and the second as a number of
+bytes.
+
+
+Querying information about a folio_queue
+========================================
+
+Information may be retrieved about a particular segment with the following
+functions::
+
+	unsigned int folioq_nr_slots(const struct folio_queue *folioq);
+
+	unsigned int folioq_count(struct folio_queue *folioq);
+
+	bool folioq_full(struct folio_queue *folioq);
+
+The first function returns the maximum capacity of a segment.  It must not be
+assumed that this won't vary between segments.  The second returns the number
+of folios added to a segments and the third is a shorthand to indicate if the
+segment has been filled to capacity.
+
+Not that the count and fullness are not affected by clearing folios from the
+segment.  These are more about indicating how many slots in the array have been
+initialised, and it assumed that slots won't get reused, but rather the segment
+will get discarded as the queue is consumed.
+
+
+Folio marks
+===========
+
+Folios within a queue can also have marks assigned to them.  These marks can be
+used to note information such as if a folio needs folio_put() calling upon it.
+There are three marks available to be set for each folio.
+
+The marks can be set by::
+
+	void folioq_mark(struct folio_queue *folioq, unsigned int slot);
+	void folioq_mark2(struct folio_queue *folioq, unsigned int slot);
+	void folioq_mark3(struct folio_queue *folioq, unsigned int slot);
+
+Cleared by::
+
+	void folioq_unmark(struct folio_queue *folioq, unsigned int slot);
+	void folioq_unmark2(struct folio_queue *folioq, unsigned int slot);
+	void folioq_unmark3(struct folio_queue *folioq, unsigned int slot);
+
+And the marks can be queried by::
+
+	bool folioq_is_marked(const struct folio_queue *folioq, unsigned int slot);
+	bool folioq_is_marked2(const struct folio_queue *folioq, unsigned int slot);
+	bool folioq_is_marked3(const struct folio_queue *folioq, unsigned int slot);
+
+The marks can be used for any purpose and are not interpreted by this API.
+
+
+Folio queue iteration
+=====================
+
+A list of segments may be iterated over using the I/O iterator facility using
+an ``iov_iter`` iterator of ``ITER_FOLIOQ`` type.  The iterator may be
+initialised with::
+
+	void iov_iter_folio_queue(struct iov_iter *i, unsigned int direction,
+				  const struct folio_queue *folioq,
+				  unsigned int first_slot, unsigned int offset,
+				  size_t count);
+
+This may be told to start at a particular segment, slot and offset within a
+queue.  The iov iterator functions will follow the next pointers when advancing
+and prev pointers when reverting when needed.
+
+
+Lockless simultaneous production/consumption issues
+===================================================
+
+If properly managed, the list can be extended by the producer at the head end
+and shortened by the consumer at the tail end simultaneously without the need
+to take locks.  The ITER_FOLIOQ iterator inserts appropriate barriers to aid
+with this.
+
+Care must be taken when simultaneously producing and consuming a list.  If the
+last segment is reached and the folios it refers to are entirely consumed by
+the IOV iterators, an iov_iter struct will be left pointing to the last segment
+with a slot number equal to the capacity of that segment.  The iterator will
+try to continue on from this if there's another segment available when it is
+used again, but care must be taken lest the segment got removed and freed by
+the consumer before the iterator was advanced.
+
+It is recommended that the queue always contain at least one segment, even if
+that segment has never been filled or is entirely spent.  This prevents the
+head and tail pointers from collapsing.
+
+
+API Function Reference
+======================
+
+.. kernel-doc:: include/linux/folio_queue.h
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index a331d2c814f5..6a875743dd4b 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -37,6 +37,7 @@ Library functionality that is used throughout the kernel.
    kref
    cleanup
    assoc_array
+   folio_queue
    xarray
    maple_tree
    idr
diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst
index bf28ac0401f3..7eb7c6023e09 100644
--- a/Documentation/core-api/protection-keys.rst
+++ b/Documentation/core-api/protection-keys.rst
@@ -12,7 +12,10 @@ Pkeys Userspace (PKU) is a feature which can be found on:
         * Intel server CPUs, Skylake and later
         * Intel client CPUs, Tiger Lake (11th Gen Core) and later
         * Future AMD CPUs
+        * arm64 CPUs implementing the Permission Overlay Extension (FEAT_S1POE)
 
+x86_64
+======
 Pkeys work by dedicating 4 previously Reserved bits in each page table entry to
 a "protection key", giving 16 possible keys.
 
@@ -28,6 +31,22 @@ register.  The feature is only available in 64-bit mode, even though there is
 theoretically space in the PAE PTEs.  These permissions are enforced on data
 access only and have no effect on instruction fetches.
 
+arm64
+=====
+
+Pkeys use 3 bits in each page table entry, to encode a "protection key index",
+giving 8 possible keys.
+
+Protections for each key are defined with a per-CPU user-writable system
+register (POR_EL0).  This is a 64-bit register encoding read, write and execute
+overlay permissions for each protection key index.
+
+Being a CPU register, POR_EL0 is inherently thread-local, potentially giving
+each thread a different set of protections from every other thread.
+
+Unlike x86_64, the protection key permissions also apply to instruction
+fetches.
+
 Syscalls
 ========
 
@@ -38,11 +57,10 @@ There are 3 system calls which directly interact with pkeys::
 	int pkey_mprotect(unsigned long start, size_t len,
 			  unsigned long prot, int pkey);
 
-Before a pkey can be used, it must first be allocated with
-pkey_alloc().  An application calls the WRPKRU instruction
-directly in order to change access permissions to memory covered
-with a key.  In this example WRPKRU is wrapped by a C function
-called pkey_set().
+Before a pkey can be used, it must first be allocated with pkey_alloc().  An
+application writes to the architecture specific CPU register directly in order
+to change access permissions to memory covered with a key.  In this example
+this is wrapped by a C function called pkey_set().
 ::
 
 	int real_prot = PROT_READ|PROT_WRITE;
@@ -64,9 +82,9 @@ is no longer in use::
 	munmap(ptr, PAGE_SIZE);
 	pkey_free(pkey);
 
-.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
-          An example implementation can be found in
-          tools/testing/selftests/x86/protection_keys.c.
+.. note:: pkey_set() is a wrapper around writing to the CPU register.
+          Example implementations can be found in
+          tools/testing/selftests/mm/pkey-{arm64,powerpc,x86}.h
 
 Behavior
 ========
@@ -96,3 +114,7 @@ with a read()::
 The kernel will send a SIGSEGV in both cases, but si_code will be set
 to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
 the plain mprotect() permissions are violated.
+
+Note that kernel accesses from a kthread (such as io_uring) will use a default
+value for the protection key register and so will not be consistent with
+userspace's value of the register or mprotect().
diff --git a/Documentation/core-api/unaligned-memory-access.rst b/Documentation/core-api/unaligned-memory-access.rst
index 1ee82419d8aa..5ceeb80eb539 100644
--- a/Documentation/core-api/unaligned-memory-access.rst
+++ b/Documentation/core-api/unaligned-memory-access.rst
@@ -203,7 +203,7 @@ Avoiding unaligned accesses
 ===========================
 
 The easiest way to avoid unaligned access is to use the get_unaligned() and
-put_unaligned() macros provided by the <asm/unaligned.h> header file.
+put_unaligned() macros provided by the <linux/unaligned.h> header file.
 
 Going back to an earlier example of code that potentially causes unaligned
 access::