diff options
Diffstat (limited to 'Documentation/core-api')
-rw-r--r-- | Documentation/core-api/folio_queue.rst | 212 | ||||
-rw-r--r-- | Documentation/core-api/index.rst | 1 | ||||
-rw-r--r-- | Documentation/core-api/protection-keys.rst | 38 | ||||
-rw-r--r-- | Documentation/core-api/unaligned-memory-access.rst | 2 |
4 files changed, 244 insertions, 9 deletions
diff --git a/Documentation/core-api/folio_queue.rst b/Documentation/core-api/folio_queue.rst new file mode 100644 index 000000000000..1fe7a9bc4b8d --- /dev/null +++ b/Documentation/core-api/folio_queue.rst @@ -0,0 +1,212 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +=========== +Folio Queue +=========== + +:Author: David Howells <[email protected]> + +.. Contents: + + * Overview + * Initialisation + * Adding and removing folios + * Querying information about a folio + * Querying information about a folio_queue + * Folio queue iteration + * Folio marks + * Lockless simultaneous production/consumption issues + + +Overview +======== + +The folio_queue struct forms a single segment in a segmented list of folios +that can be used to form an I/O buffer. As such, the list can be iterated over +using the ITER_FOLIOQ iov_iter type. + +The publicly accessible members of the structure are:: + + struct folio_queue { + struct folio_queue *next; + struct folio_queue *prev; + ... + }; + +A pair of pointers are provided, ``next`` and ``prev``, that point to the +segments on either side of the segment being accessed. Whilst this is a +doubly-linked list, it is intentionally not a circular list; the outward +sibling pointers in terminal segments should be NULL. + +Each segment in the list also stores: + + * an ordered sequence of folio pointers, + * the size of each folio and + * three 1-bit marks per folio, + +but hese should not be accessed directly as the underlying data structure may +change, but rather the access functions outlined below should be used. + +The facility can be made accessible by:: + + #include <linux/folio_queue.h> + +and to use the iterator:: + + #include <linux/uio.h> + + +Initialisation +============== + +A segment should be initialised by calling:: + + void folioq_init(struct folio_queue *folioq); + +with a pointer to the segment to be initialised. Note that this will not +necessarily initialise all the folio pointers, so care must be taken to check +the number of folios added. + + +Adding and removing folios +========================== + +Folios can be set in the next unused slot in a segment struct by calling one +of:: + + unsigned int folioq_append(struct folio_queue *folioq, + struct folio *folio); + + unsigned int folioq_append_mark(struct folio_queue *folioq, + struct folio *folio); + +Both functions update the stored folio count, store the folio and note its +size. The second function also sets the first mark for the folio added. Both +functions return the number of the slot used. [!] Note that no attempt is made +to check that the capacity wasn't overrun and the list will not be extended +automatically. + +A folio can be excised by calling:: + + void folioq_clear(struct folio_queue *folioq, unsigned int slot); + +This clears the slot in the array and also clears all the marks for that folio, +but doesn't change the folio count - so future accesses of that slot must check +if the slot is occupied. + + +Querying information about a folio +================================== + +Information about the folio in a particular slot may be queried by the +following function:: + + struct folio *folioq_folio(const struct folio_queue *folioq, + unsigned int slot); + +If a folio has not yet been set in that slot, this may yield an undefined +pointer. The size of the folio in a slot may be queried with either of:: + + unsigned int folioq_folio_order(const struct folio_queue *folioq, + unsigned int slot); + + size_t folioq_folio_size(const struct folio_queue *folioq, + unsigned int slot); + +The first function returns the size as an order and the second as a number of +bytes. + + +Querying information about a folio_queue +======================================== + +Information may be retrieved about a particular segment with the following +functions:: + + unsigned int folioq_nr_slots(const struct folio_queue *folioq); + + unsigned int folioq_count(struct folio_queue *folioq); + + bool folioq_full(struct folio_queue *folioq); + +The first function returns the maximum capacity of a segment. It must not be +assumed that this won't vary between segments. The second returns the number +of folios added to a segments and the third is a shorthand to indicate if the +segment has been filled to capacity. + +Not that the count and fullness are not affected by clearing folios from the +segment. These are more about indicating how many slots in the array have been +initialised, and it assumed that slots won't get reused, but rather the segment +will get discarded as the queue is consumed. + + +Folio marks +=========== + +Folios within a queue can also have marks assigned to them. These marks can be +used to note information such as if a folio needs folio_put() calling upon it. +There are three marks available to be set for each folio. + +The marks can be set by:: + + void folioq_mark(struct folio_queue *folioq, unsigned int slot); + void folioq_mark2(struct folio_queue *folioq, unsigned int slot); + void folioq_mark3(struct folio_queue *folioq, unsigned int slot); + +Cleared by:: + + void folioq_unmark(struct folio_queue *folioq, unsigned int slot); + void folioq_unmark2(struct folio_queue *folioq, unsigned int slot); + void folioq_unmark3(struct folio_queue *folioq, unsigned int slot); + +And the marks can be queried by:: + + bool folioq_is_marked(const struct folio_queue *folioq, unsigned int slot); + bool folioq_is_marked2(const struct folio_queue *folioq, unsigned int slot); + bool folioq_is_marked3(const struct folio_queue *folioq, unsigned int slot); + +The marks can be used for any purpose and are not interpreted by this API. + + +Folio queue iteration +===================== + +A list of segments may be iterated over using the I/O iterator facility using +an ``iov_iter`` iterator of ``ITER_FOLIOQ`` type. The iterator may be +initialised with:: + + void iov_iter_folio_queue(struct iov_iter *i, unsigned int direction, + const struct folio_queue *folioq, + unsigned int first_slot, unsigned int offset, + size_t count); + +This may be told to start at a particular segment, slot and offset within a +queue. The iov iterator functions will follow the next pointers when advancing +and prev pointers when reverting when needed. + + +Lockless simultaneous production/consumption issues +=================================================== + +If properly managed, the list can be extended by the producer at the head end +and shortened by the consumer at the tail end simultaneously without the need +to take locks. The ITER_FOLIOQ iterator inserts appropriate barriers to aid +with this. + +Care must be taken when simultaneously producing and consuming a list. If the +last segment is reached and the folios it refers to are entirely consumed by +the IOV iterators, an iov_iter struct will be left pointing to the last segment +with a slot number equal to the capacity of that segment. The iterator will +try to continue on from this if there's another segment available when it is +used again, but care must be taken lest the segment got removed and freed by +the consumer before the iterator was advanced. + +It is recommended that the queue always contain at least one segment, even if +that segment has never been filled or is entirely spent. This prevents the +head and tail pointers from collapsing. + + +API Function Reference +====================== + +.. kernel-doc:: include/linux/folio_queue.h diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index a331d2c814f5..6a875743dd4b 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -37,6 +37,7 @@ Library functionality that is used throughout the kernel. kref cleanup assoc_array + folio_queue xarray maple_tree idr diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst index bf28ac0401f3..7eb7c6023e09 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -12,7 +12,10 @@ Pkeys Userspace (PKU) is a feature which can be found on: * Intel server CPUs, Skylake and later * Intel client CPUs, Tiger Lake (11th Gen Core) and later * Future AMD CPUs + * arm64 CPUs implementing the Permission Overlay Extension (FEAT_S1POE) +x86_64 +====== Pkeys work by dedicating 4 previously Reserved bits in each page table entry to a "protection key", giving 16 possible keys. @@ -28,6 +31,22 @@ register. The feature is only available in 64-bit mode, even though there is theoretically space in the PAE PTEs. These permissions are enforced on data access only and have no effect on instruction fetches. +arm64 +===== + +Pkeys use 3 bits in each page table entry, to encode a "protection key index", +giving 8 possible keys. + +Protections for each key are defined with a per-CPU user-writable system +register (POR_EL0). This is a 64-bit register encoding read, write and execute +overlay permissions for each protection key index. + +Being a CPU register, POR_EL0 is inherently thread-local, potentially giving +each thread a different set of protections from every other thread. + +Unlike x86_64, the protection key permissions also apply to instruction +fetches. + Syscalls ======== @@ -38,11 +57,10 @@ There are 3 system calls which directly interact with pkeys:: int pkey_mprotect(unsigned long start, size_t len, unsigned long prot, int pkey); -Before a pkey can be used, it must first be allocated with -pkey_alloc(). An application calls the WRPKRU instruction -directly in order to change access permissions to memory covered -with a key. In this example WRPKRU is wrapped by a C function -called pkey_set(). +Before a pkey can be used, it must first be allocated with pkey_alloc(). An +application writes to the architecture specific CPU register directly in order +to change access permissions to memory covered with a key. In this example +this is wrapped by a C function called pkey_set(). :: int real_prot = PROT_READ|PROT_WRITE; @@ -64,9 +82,9 @@ is no longer in use:: munmap(ptr, PAGE_SIZE); pkey_free(pkey); -.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions. - An example implementation can be found in - tools/testing/selftests/x86/protection_keys.c. +.. note:: pkey_set() is a wrapper around writing to the CPU register. + Example implementations can be found in + tools/testing/selftests/mm/pkey-{arm64,powerpc,x86}.h Behavior ======== @@ -96,3 +114,7 @@ with a read():: The kernel will send a SIGSEGV in both cases, but si_code will be set to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when the plain mprotect() permissions are violated. + +Note that kernel accesses from a kthread (such as io_uring) will use a default +value for the protection key register and so will not be consistent with +userspace's value of the register or mprotect(). diff --git a/Documentation/core-api/unaligned-memory-access.rst b/Documentation/core-api/unaligned-memory-access.rst index 1ee82419d8aa..5ceeb80eb539 100644 --- a/Documentation/core-api/unaligned-memory-access.rst +++ b/Documentation/core-api/unaligned-memory-access.rst @@ -203,7 +203,7 @@ Avoiding unaligned accesses =========================== The easiest way to avoid unaligned access is to use the get_unaligned() and -put_unaligned() macros provided by the <asm/unaligned.h> header file. +put_unaligned() macros provided by the <linux/unaligned.h> header file. Going back to an earlier example of code that potentially causes unaligned access:: |