aboutsummaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/admin-guide/LSM/ipe.rst7
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt2
-rw-r--r--Documentation/admin-guide/mm/transhuge.rst2
-rw-r--r--Documentation/admin-guide/pm/cpufreq.rst20
-rw-r--r--Documentation/core-api/protection-keys.rst38
-rw-r--r--Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.yaml24
-rw-r--r--Documentation/devicetree/bindings/display/mediatek/mediatek,split.yaml19
-rw-r--r--Documentation/devicetree/bindings/firmware/arm,scmi.yaml2
-rw-r--r--Documentation/devicetree/bindings/iio/adc/adi,ad7380.yaml21
-rw-r--r--Documentation/devicetree/bindings/iio/dac/adi,ad5686.yaml53
-rw-r--r--Documentation/devicetree/bindings/iio/dac/adi,ad5696.yaml3
-rw-r--r--Documentation/devicetree/bindings/iommu/arm,smmu.yaml5
-rw-r--r--Documentation/devicetree/bindings/iommu/riscv,iommu.yaml147
-rw-r--r--Documentation/devicetree/bindings/net/brcm,unimac-mdio.yaml1
-rw-r--r--Documentation/devicetree/bindings/net/xlnx,axi-ethernet.yaml2
-rw-r--r--Documentation/devicetree/bindings/phy/qcom,sc8280xp-qmp-pcie-phy.yaml5
-rw-r--r--Documentation/devicetree/bindings/sound/davinci-mcasp-audio.yaml18
-rw-r--r--Documentation/devicetree/bindings/sound/rockchip,rk3308-codec.yaml4
-rw-r--r--Documentation/filesystems/caching/cachefiles.rst2
-rw-r--r--Documentation/filesystems/iomap/operations.rst2
-rw-r--r--Documentation/filesystems/netfs_library.rst1
-rw-r--r--Documentation/iio/ad7380.rst13
-rw-r--r--Documentation/mm/damon/maintainer-profile.rst38
-rw-r--r--Documentation/netlink/specs/mptcp_pm.yaml1
-rw-r--r--Documentation/networking/j1939.rst2
-rw-r--r--Documentation/networking/packet_mmap.rst5
-rw-r--r--Documentation/process/maintainer-soc.rst42
-rw-r--r--Documentation/rust/arch-support.rst2
-rw-r--r--Documentation/userspace-api/mseal.rst307
-rw-r--r--Documentation/virt/kvm/api.rst16
-rw-r--r--Documentation/virt/kvm/locking.rst2
31 files changed, 519 insertions, 287 deletions
diff --git a/Documentation/admin-guide/LSM/ipe.rst b/Documentation/admin-guide/LSM/ipe.rst
index f38e641df0e9..f93a467db628 100644
--- a/Documentation/admin-guide/LSM/ipe.rst
+++ b/Documentation/admin-guide/LSM/ipe.rst
@@ -223,7 +223,10 @@ are signed through the PKCS#7 message format to enforce some level of
authorization of the policies (prohibiting an attacker from gaining
unconstrained root, and deploying an "allow all" policy). These
policies must be signed by a certificate that chains to the
-``SYSTEM_TRUSTED_KEYRING``. With openssl, the policy can be signed by::
+``SYSTEM_TRUSTED_KEYRING``, or to the secondary and/or platform keyrings if
+``CONFIG_IPE_POLICY_SIG_SECONDARY_KEYRING`` and/or
+``CONFIG_IPE_POLICY_SIG_PLATFORM_KEYRING`` are enabled, respectively.
+With openssl, the policy can be signed by::
openssl smime -sign \
-in "$MY_POLICY" \
@@ -266,7 +269,7 @@ in the kernel. This file is write-only and accepts a PKCS#7 signed
policy. Two checks will always be performed on this policy: First, the
``policy_names`` must match with the updated version and the existing
version. Second the updated policy must have a policy version greater than
-or equal to the currently-running version. This is to prevent rollback attacks.
+the currently-running version. This is to prevent rollback attacks.
The ``delete`` file is used to remove a policy that is no longer needed.
This file is write-only and accepts a value of ``1`` to delete the policy.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 1518343bbe22..1666576acc0e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6688,7 +6688,7 @@
0: no polling (default)
thp_anon= [KNL]
- Format: <size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>
+ Format: <size>[KMG],<size>[KMG]:<state>;<size>[KMG]-<size>[KMG]:<state>
state is one of "always", "madvise", "never" or "inherit".
Control the default behavior of the system with respect
to anonymous transparent hugepages.
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index cfdd16a52e39..a1bb495eab59 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -303,7 +303,7 @@ control by passing the parameter ``transparent_hugepage=always`` or
kernel command line.
Alternatively, each supported anonymous THP size can be controlled by
-passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``,
+passing ``thp_anon=<size>[KMG],<size>[KMG]:<state>;<size>[KMG]-<size>[KMG]:<state>``,
where ``<size>`` is the THP size (must be a power of 2 of PAGE_SIZE and
supported anonymous THP) and ``<state>`` is one of ``always``, ``madvise``,
``never`` or ``inherit``.
diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
index fe1be4ad88cb..a21369eba034 100644
--- a/Documentation/admin-guide/pm/cpufreq.rst
+++ b/Documentation/admin-guide/pm/cpufreq.rst
@@ -425,8 +425,8 @@ This governor exposes only one tunable:
``rate_limit_us``
Minimum time (in microseconds) that has to pass between two consecutive
- runs of governor computations (default: 1000 times the scaling driver's
- transition latency).
+ runs of governor computations (default: 1.5 times the scaling driver's
+ transition latency or the maximum 2ms).
The purpose of this tunable is to reduce the scheduler context overhead
of the governor which might be excessive without it.
@@ -474,17 +474,17 @@ This governor exposes the following tunables:
This is how often the governor's worker routine should run, in
microseconds.
- Typically, it is set to values of the order of 10000 (10 ms). Its
- default value is equal to the value of ``cpuinfo_transition_latency``
- for each policy this governor is attached to (but since the unit here
- is greater by 1000, this means that the time represented by
- ``sampling_rate`` is 1000 times greater than the transition latency by
- default).
+ Typically, it is set to values of the order of 2000 (2 ms). Its
+ default value is to add a 50% breathing room
+ to ``cpuinfo_transition_latency`` on each policy this governor is
+ attached to. The minimum is typically the length of two scheduler
+ ticks.
If this tunable is per-policy, the following shell command sets the time
- represented by it to be 750 times as high as the transition latency::
+ represented by it to be 1.5 times as high as the transition latency
+ (the default)::
- # echo `$(($(cat cpuinfo_transition_latency) * 750 / 1000)) > ondemand/sampling_rate
+ # echo `$(($(cat cpuinfo_transition_latency) * 3 / 2)) > ondemand/sampling_rate
``up_threshold``
If the estimated CPU load is above this value (in percent), the governor
diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst
index bf28ac0401f3..7eb7c6023e09 100644
--- a/Documentation/core-api/protection-keys.rst
+++ b/Documentation/core-api/protection-keys.rst
@@ -12,7 +12,10 @@ Pkeys Userspace (PKU) is a feature which can be found on:
* Intel server CPUs, Skylake and later
* Intel client CPUs, Tiger Lake (11th Gen Core) and later
* Future AMD CPUs
+ * arm64 CPUs implementing the Permission Overlay Extension (FEAT_S1POE)
+x86_64
+======
Pkeys work by dedicating 4 previously Reserved bits in each page table entry to
a "protection key", giving 16 possible keys.
@@ -28,6 +31,22 @@ register. The feature is only available in 64-bit mode, even though there is
theoretically space in the PAE PTEs. These permissions are enforced on data
access only and have no effect on instruction fetches.
+arm64
+=====
+
+Pkeys use 3 bits in each page table entry, to encode a "protection key index",
+giving 8 possible keys.
+
+Protections for each key are defined with a per-CPU user-writable system
+register (POR_EL0). This is a 64-bit register encoding read, write and execute
+overlay permissions for each protection key index.
+
+Being a CPU register, POR_EL0 is inherently thread-local, potentially giving
+each thread a different set of protections from every other thread.
+
+Unlike x86_64, the protection key permissions also apply to instruction
+fetches.
+
Syscalls
========
@@ -38,11 +57,10 @@ There are 3 system calls which directly interact with pkeys::
int pkey_mprotect(unsigned long start, size_t len,
unsigned long prot, int pkey);
-Before a pkey can be used, it must first be allocated with
-pkey_alloc(). An application calls the WRPKRU instruction
-directly in order to change access permissions to memory covered
-with a key. In this example WRPKRU is wrapped by a C function
-called pkey_set().
+Before a pkey can be used, it must first be allocated with pkey_alloc(). An
+application writes to the architecture specific CPU register directly in order
+to change access permissions to memory covered with a key. In this example
+this is wrapped by a C function called pkey_set().
::
int real_prot = PROT_READ|PROT_WRITE;
@@ -64,9 +82,9 @@ is no longer in use::
munmap(ptr, PAGE_SIZE);
pkey_free(pkey);
-.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
- An example implementation can be found in
- tools/testing/selftests/x86/protection_keys.c.
+.. note:: pkey_set() is a wrapper around writing to the CPU register.
+ Example implementations can be found in
+ tools/testing/selftests/mm/pkey-{arm64,powerpc,x86}.h
Behavior
========
@@ -96,3 +114,7 @@ with a read()::
The kernel will send a SIGSEGV in both cases, but si_code will be set
to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
the plain mprotect() permissions are violated.
+
+Note that kernel accesses from a kthread (such as io_uring) will use a default
+value for the protection key register and so will not be consistent with
+userspace's value of the register or mprotect().
diff --git a/Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.yaml b/Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.yaml
index 3a82aec9021c..497c0eb4ed0b 100644
--- a/Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.yaml
+++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,dpi.yaml
@@ -63,6 +63,16 @@ properties:
- const: sleep
power-domains:
+ description: |
+ The MediaTek DPI module is typically associated with one of the
+ following multimedia power domains:
+ POWER_DOMAIN_DISPLAY
+ POWER_DOMAIN_VDOSYS
+ POWER_DOMAIN_MM
+ The specific power domain used varies depending on the SoC design.
+
+ It is recommended to explicitly add the appropriate power domain
+ property to the DPI node in the device tree.
maxItems: 1
port:
@@ -79,20 +89,6 @@ required:
- clock-names
- port
-allOf:
- - if:
- not:
- properties:
- compatible:
- contains:
- enum:
- - mediatek,mt6795-dpi
- - mediatek,mt8173-dpi
- - mediatek,mt8186-dpi
- then:
- properties:
- power-domains: false
-
additionalProperties: false
examples:
diff --git a/Documentation/devicetree/bindings/display/mediatek/mediatek,split.yaml b/Documentation/devicetree/bindings/display/mediatek/mediatek,split.yaml
index e4affc854f3d..4b6ff546757e 100644
--- a/Documentation/devicetree/bindings/display/mediatek/mediatek,split.yaml
+++ b/Documentation/devicetree/bindings/display/mediatek/mediatek,split.yaml
@@ -38,6 +38,7 @@ properties:
description: A phandle and PM domain specifier as defined by bindings of
the power controller specified by phandle. See
Documentation/devicetree/bindings/power/power-domain.yaml for details.
+ maxItems: 1
mediatek,gce-client-reg:
description:
@@ -57,6 +58,9 @@ properties:
clocks:
items:
- description: SPLIT Clock
+ - description: Used for interfacing with the HDMI RX signal source.
+ - description: Paired with receiving HDMI RX metadata.
+ minItems: 1
required:
- compatible
@@ -72,9 +76,24 @@ allOf:
const: mediatek,mt8195-mdp3-split
then:
+ properties:
+ clocks:
+ minItems: 3
+
required:
- mediatek,gce-client-reg
+ - if:
+ properties:
+ compatible:
+ contains:
+ const: mediatek,mt8173-disp-split
+
+ then:
+ properties:
+ clocks:
+ maxItems: 1
+
additionalProperties: false
examples:
diff --git a/Documentation/devicetree/bindings/firmware/arm,scmi.yaml b/Documentation/devicetree/bindings/firmware/arm,scmi.yaml
index 54d7d11bfed4..ff7a6f12cd00 100644
--- a/Documentation/devicetree/bindings/firmware/arm,scmi.yaml
+++ b/Documentation/devicetree/bindings/firmware/arm,scmi.yaml
@@ -124,7 +124,7 @@ properties:
atomic mode of operation, even if requested.
default: 0
- max-rx-timeout-ms:
+ arm,max-rx-timeout-ms:
description:
An optional time value, expressed in milliseconds, representing the
transport maximum timeout value for the receive channel. The value should
diff --git a/Documentation/devicetree/bindings/iio/adc/adi,ad7380.yaml b/Documentation/devicetree/bindings/iio/adc/adi,ad7380.yaml
index bd19abb867d9..0065d6508824 100644
--- a/Documentation/devicetree/bindings/iio/adc/adi,ad7380.yaml
+++ b/Documentation/devicetree/bindings/iio/adc/adi,ad7380.yaml
@@ -67,6 +67,10 @@ properties:
A 2.5V to 3.3V supply for the external reference voltage. When omitted,
the internal 2.5V reference is used.
+ refin-supply:
+ description:
+ A 2.5V to 3.3V supply for external reference voltage, for ad7380-4 only.
+
aina-supply:
description:
The common mode voltage supply for the AINA- pin on pseudo-differential
@@ -135,6 +139,23 @@ allOf:
ainc-supply: false
aind-supply: false
+ # ad7380-4 uses refin-supply as external reference.
+ # All other chips from ad738x family use refio as optional external reference.
+ # When refio-supply is omitted, internal reference is used.
+ - if:
+ properties:
+ compatible:
+ enum:
+ - adi,ad7380-4
+ then:
+ properties:
+ refio-supply: false
+ required:
+ - refin-supply
+ else:
+ properties:
+ refin-supply: false
+
examples:
- |
#include <dt-bindings/interrupt-controller/irq.h>
diff --git a/Documentation/devicetree/bindings/iio/dac/adi,ad5686.yaml b/Documentation/devicetree/bindings/iio/dac/adi,ad5686.yaml
index b4400c52bec3..713f535bb33a 100644
--- a/Documentation/devicetree/bindings/iio/dac/adi,ad5686.yaml
+++ b/Documentation/devicetree/bindings/iio/dac/adi,ad5686.yaml
@@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/iio/dac/adi,ad5686.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
-title: Analog Devices AD5360 and similar DACs
+title: Analog Devices AD5360 and similar SPI DACs
maintainers:
- Michael Hennerich <[email protected]>
@@ -12,41 +12,22 @@ maintainers:
properties:
compatible:
- oneOf:
- - description: SPI devices
- enum:
- - adi,ad5310r
- - adi,ad5672r
- - adi,ad5674r
- - adi,ad5676
- - adi,ad5676r
- - adi,ad5679r
- - adi,ad5681r
- - adi,ad5682r
- - adi,ad5683
- - adi,ad5683r
- - adi,ad5684
- - adi,ad5684r
- - adi,ad5685r
- - adi,ad5686
- - adi,ad5686r
- - description: I2C devices
- enum:
- - adi,ad5311r
- - adi,ad5337r
- - adi,ad5338r
- - adi,ad5671r
- - adi,ad5675r
- - adi,ad5691r
- - adi,ad5692r
- - adi,ad5693
- - adi,ad5693r
- - adi,ad5694
- - adi,ad5694r
- - adi,ad5695r
- - adi,ad5696
- - adi,ad5696r
-
+ enum:
+ - adi,ad5310r
+ - adi,ad5672r
+ - adi,ad5674r
+ - adi,ad5676
+ - adi,ad5676r
+ - adi,ad5679r
+ - adi,ad5681r
+ - adi,ad5682r
+ - adi,ad5683
+ - adi,ad5683r
+ - adi,ad5684
+ - adi,ad5684r
+ - adi,ad5685r
+ - adi,ad5686
+ - adi,ad5686r
reg:
maxItems: 1
diff --git a/Documentation/devicetree/bindings/iio/dac/adi,ad5696.yaml b/Documentation/devicetree/bindings/iio/dac/adi,ad5696.yaml
index 56b0cda0f30a..b5a88b03dc2f 100644
--- a/Documentation/devicetree/bindings/iio/dac/adi,ad5696.yaml
+++ b/Documentation/devicetree/bindings/iio/dac/adi,ad5696.yaml
@@ -4,7 +4,7 @@
$id: http://devicetree.org/schemas/iio/dac/adi,ad5696.yaml#
$schema: http://devicetree.org/meta-schemas/core.yaml#
-title: Analog Devices AD5696 and similar multi-channel DACs
+title: Analog Devices AD5696 and similar I2C multi-channel DACs
maintainers:
- Michael Auchter <[email protected]>
@@ -16,6 +16,7 @@ properties:
compatible:
enum:
- adi,ad5311r
+ - adi,ad5337r
- adi,ad5338r
- adi,ad5671r
- adi,ad5675r
diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index 92d350b8e01a..c1e11bc6b7a0 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -36,10 +36,12 @@ properties:
items:
- enum:
- qcom,qcm2290-smmu-500
+ - qcom,qcs615-smmu-500
- qcom,qcs8300-smmu-500
- qcom,qdu1000-smmu-500
- qcom,sa8255p-smmu-500
- qcom,sa8775p-smmu-500
+ - qcom,sar2130p-smmu-500
- qcom,sc7180-smmu-500
- qcom,sc7280-smmu-500
- qcom,sc8180x-smmu-500
@@ -88,6 +90,7 @@ properties:
- qcom,qcm2290-smmu-500
- qcom,sa8255p-smmu-500
- qcom,sa8775p-smmu-500
+ - qcom,sar2130p-smmu-500
- qcom,sc7280-smmu-500
- qcom,sc8180x-smmu-500
- qcom,sc8280xp-smmu-500
@@ -524,6 +527,7 @@ allOf:
compatible:
items:
- enum:
+ - qcom,sar2130p-smmu-500
- qcom,sm8550-smmu-500
- qcom,sm8650-smmu-500
- qcom,x1e80100-smmu-500
@@ -555,6 +559,7 @@ allOf:
- cavium,smmu-v2
- marvell,ap806-smmu-500
- nvidia,smmu-500
+ - qcom,qcs615-smmu-500
- qcom,qcs8300-smmu-500
- qcom,qdu1000-smmu-500
- qcom,sa8255p-smmu-500
diff --git a/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
new file mode 100644
index 000000000000..5d015eeb06d0
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/riscv,iommu.yaml
@@ -0,0 +1,147 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/riscv,iommu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: RISC-V IOMMU Architecture Implementation
+
+maintainers:
+ - Tomasz Jeznach <[email protected]>
+
+description: |
+ The RISC-V IOMMU provides memory address translation and isolation for
+ input and output devices, supporting per-device translation context,
+ shared process address spaces including the ATS and PRI components of
+ the PCIe specification, two stage address translation and MSI remapping.
+ It supports identical translation table format to the RISC-V address
+ translation tables with page level access and protection attributes.
+ Hardware uses in-memory command and fault reporting queues with wired
+ interrupt or MSI notifications.
+
+ Visit https://github.com/riscv-non-isa/riscv-iommu for more details.
+
+ For information on assigning RISC-V IOMMU to its peripheral devices,
+ see generic IOMMU bindings.
+
+properties:
+ # For PCIe IOMMU hardware compatible property should contain the vendor
+ # and device ID according to the PCI Bus Binding specification.
+ # Since PCI provides built-in identification methods, compatible is not
+ # actually required. For non-PCIe hardware implementations 'riscv,iommu'
+ # should be specified along with 'reg' property providing MMIO location.
+ compatible:
+ oneOf:
+ - items:
+ - enum:
+ - qemu,riscv-iommu
+ - const: riscv,iommu
+ - items:
+ - enum:
+ - pci1efd,edf1
+ - const: riscv,pci-iommu
+
+ reg:
+ maxItems: 1
+ description:
+ For non-PCI devices this represents base address and size of for the
+ IOMMU memory mapped registers interface.
+ For PCI IOMMU hardware implementation this should represent an address
+ of the IOMMU, as defined in the PCI Bus Binding reference.
+
+ '#iommu-cells':
+ const: 1
+ description:
+ The single cell describes the requester id emitted by a master to the
+ IOMMU.
+
+ interrupts:
+ minItems: 1
+ maxItems: 4
+ description:
+ Wired interrupt vectors available for RISC-V IOMMU to notify the
+ RISC-V HARTS. The cause to interrupt vector is software defined
+ using IVEC IOMMU register.
+
+ msi-parent: true
+
+ power-domains:
+ maxItems: 1
+
+required:
+ - compatible
+ - reg
+ - '#iommu-cells'
+
+additionalProperties: false
+
+examples:
+ - |+
+ /* Example 1 (IOMMU device with wired interrupts) */
+ #include <dt-bindings/interrupt-controller/irq.h>
+
+ iommu1: iommu@1bccd000 {
+ compatible = "qemu,riscv-iommu", "riscv,iommu";
+ reg = <0x1bccd000 0x1000>;
+ interrupt-parent = <&aplic_smode>;
+ interrupts = <32 IRQ_TYPE_LEVEL_HIGH>,
+ <33 IRQ_TYPE_LEVEL_HIGH>,
+ <34 IRQ_TYPE_LEVEL_HIGH>,
+ <35 IRQ_TYPE_LEVEL_HIGH>;
+ #iommu-cells = <1>;
+ };
+
+ /* Device with two IOMMU device IDs, 0 and 7 */
+ master1 {
+ iommus = <&iommu1 0>, <&iommu1 7>;
+ };
+
+ - |+
+ /* Example 2 (IOMMU device with shared wired interrupt) */
+ #include <dt-bindings/interrupt-controller/irq.h>
+
+ iommu2: iommu@1bccd000 {
+ compatible = "qemu,riscv-iommu", "riscv,iommu";
+ reg = <0x1bccd000 0x1000>;
+ interrupt-parent = <&aplic_smode>;
+ interrupts = <32 IRQ_TYPE_LEVEL_HIGH>;
+ #iommu-cells = <1>;
+ };
+
+ - |+
+ /* Example 3 (IOMMU device with MSIs) */
+ iommu3: iommu@1bcdd000 {
+ compatible = "qemu,riscv-iommu", "riscv,iommu";
+ reg = <0x1bccd000 0x1000>;
+ msi-parent = <&imsics_smode>;
+ #iommu-cells = <1>;
+ };
+
+ - |+
+ /* Example 4 (IOMMU PCIe device with MSIs) */
+ bus {
+ #address-cells = <2>;
+ #size-cells = <2>;
+
+ pcie@30000000 {
+ device_type = "pci";
+ #address-cells = <3>;
+ #size-cells = <2>;
+ reg = <0x0 0x30000000 0x0 0x1000000>;
+ ranges = <0x02000000 0x0 0x41000000 0x0 0x41000000 0x0 0x0f000000>;
+
+ /*
+ * The IOMMU manages all functions in this PCI domain except
+ * itself. Omit BDF 00:01.0.
+ */
+ iommu-map = <0x0 &iommu0 0x0 0x8>,
+ <0x9 &iommu0 0x9 0xfff7>;
+
+ /* The IOMMU programming interface uses slot 00:01.0 */
+ iommu0: iommu@1,0 {
+ compatible = "pci1efd,edf1", "riscv,pci-iommu";
+ reg = <0x800 0 0 0 0>;
+ #iommu-cells = <1>;
+ };
+ };
+ };
diff --git a/Documentation/devicetree/bindings/net/brcm,unimac-mdio.yaml b/Documentation/devicetree/bindings/net/brcm,unimac-mdio.yaml
index 23dfe0838dca..63bee5b542f5 100644
--- a/Documentation/devicetree/bindings/net/brcm,unimac-mdio.yaml
+++ b/Documentation/devicetree/bindings/net/brcm,unimac-mdio.yaml
@@ -26,6 +26,7 @@ properties:
- brcm,asp-v2.1-mdio
- brcm,asp-v2.2-mdio
- brcm,unimac-mdio
+ - brcm,bcm6846-mdio
reg:
minItems: 1
diff --git a/Documentation/devicetree/bindings/net/xlnx,axi-ethernet.yaml b/Documentation/devicetree/bindings/net/xlnx,axi-ethernet.yaml
index e95c21628281..fb02e579463c 100644
--- a/Documentation/devicetree/bindings/net/xlnx,axi-ethernet.yaml
+++ b/Documentation/devicetree/bindings/net/xlnx,axi-ethernet.yaml
@@ -61,7 +61,7 @@ properties:
- gmii
- rgmii
- sgmii
- - 1000BaseX
+ - 1000base-x
xlnx,phy-type:
description:
diff --git a/Documentation/devicetree/bindings/phy/qcom,sc8280xp-qmp-pcie-phy.yaml b/Documentation/devicetree/bindings/phy/qcom,sc8280xp-qmp-pcie-phy.yaml
index dcf4fa55fbba..380a9222a51d 100644
--- a/Documentation/devicetree/bindings/phy/qcom,sc8280xp-qmp-pcie-phy.yaml
+++ b/Documentation/devicetree/bindings/phy/qcom,sc8280xp-qmp-pcie-phy.yaml
@@ -154,8 +154,6 @@ allOf:
- qcom,sm8550-qmp-gen4x2-pcie-phy
- qcom,sm8650-qmp-gen3x2-pcie-phy
- qcom,sm8650-qmp-gen4x2-pcie-phy
- - qcom,x1e80100-qmp-gen3x2-pcie-phy
- - qcom,x1e80100-qmp-gen4x2-pcie-phy
then:
properties:
clocks:
@@ -171,6 +169,8 @@ allOf:
- qcom,sc8280xp-qmp-gen3x1-pcie-phy
- qcom,sc8280xp-qmp-gen3x2-pcie-phy
- qcom,sc8280xp-qmp-gen3x4-pcie-phy
+ - qcom,x1e80100-qmp-gen3x2-pcie-phy
+ - qcom,x1e80100-qmp-gen4x2-pcie-phy
- qcom,x1e80100-qmp-gen4x4-pcie-phy
then:
properties:
@@ -201,6 +201,7 @@ allOf:
- qcom,sm8550-qmp-gen4x2-pcie-phy
- qcom,sm8650-qmp-gen4x2-pcie-phy
- qcom,x1e80100-qmp-gen4x2-pcie-phy
+ - qcom,x1e80100-qmp-gen4x4-pcie-phy
then:
properties:
resets:
diff --git a/Documentation/devicetree/bindings/sound/davinci-mcasp-audio.yaml b/Documentation/devicetree/bindings/sound/davinci-mcasp-audio.yaml
index ab3206ffa4af..beef193aaaeb 100644
--- a/Documentation/devicetree/bindings/sound/davinci-mcasp-audio.yaml
+++ b/Documentation/devicetree/bindings/sound/davinci-mcasp-audio.yaml
@@ -102,21 +102,21 @@ properties:
default: 2
interrupts:
- oneOf:
- - minItems: 1
- items:
- - description: TX interrupt
- - description: RX interrupt
- - items:
- - description: common/combined interrupt
+ minItems: 1
+ maxItems: 2
interrupt-names:
oneOf:
- - minItems: 1
+ - description: TX interrupt
+ const: tx
+ - description: RX interrupt
+ const: rx
+ - description: TX and RX interrupts
items:
- const: tx
- const: rx
- - const: common
+ - description: Common/combined interrupt
+ const: common
fck_parent:
$ref: /schemas/types.yaml#/definitions/string
diff --git a/Documentation/devicetree/bindings/sound/rockchip,rk3308-codec.yaml b/Documentation/devicetree/bindings/sound/rockchip,rk3308-codec.yaml
index ecf3d7d968c8..2cf229a076f0 100644
--- a/Documentation/devicetree/bindings/sound/rockchip,rk3308-codec.yaml
+++ b/Documentation/devicetree/bindings/sound/rockchip,rk3308-codec.yaml
@@ -48,6 +48,10 @@ properties:
- const: mclk_rx
- const: hclk
+ port:
+ $ref: audio-graph-port.yaml#
+ unevaluatedProperties: false
+
resets:
maxItems: 1
diff --git a/Documentation/filesystems/caching/cachefiles.rst b/Documentation/filesystems/caching/cachefiles.rst
index e04a27bdbe19..b3ccc782cb3b 100644
--- a/Documentation/filesystems/caching/cachefiles.rst
+++ b/Documentation/filesystems/caching/cachefiles.rst
@@ -115,7 +115,7 @@ set up cache ready for use. The following script commands are available:
This mask can also be set through sysfs, eg::
- echo 5 >/sys/modules/cachefiles/parameters/debug
+ echo 5 > /sys/module/cachefiles/parameters/debug
Starting the Cache
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index 8e6c721d2330..b93115ab8748 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -208,7 +208,7 @@ The filesystem must arrange to `cancel
such `reservations
<https://lore.kernel.org/linux-xfs/[email protected]/>`_
because writeback will not consume the reservation.
-The ``iomap_file_buffered_write_punch_delalloc`` can be called from a
+The ``iomap_write_delalloc_release`` can be called from a
``->iomap_end`` function to find all the clean areas of the folios
caching a fresh (``IOMAP_F_NEW``) delalloc mapping.
It takes the ``invalidate_lock``.
diff --git a/Documentation/filesystems/netfs_library.rst b/Documentation/filesystems/netfs_library.rst
index f0d2cb257bb8..73f0bfd7e903 100644
--- a/Documentation/filesystems/netfs_library.rst
+++ b/Documentation/filesystems/netfs_library.rst
@@ -592,4 +592,3 @@ API Function Reference
.. kernel-doc:: include/linux/netfs.h
.. kernel-doc:: fs/netfs/buffered_read.c
-.. kernel-doc:: fs/netfs/io.c
diff --git a/Documentation/iio/ad7380.rst b/Documentation/iio/ad7380.rst
index 9c784c1e652e..6f70b49b9ef2 100644
--- a/Documentation/iio/ad7380.rst
+++ b/Documentation/iio/ad7380.rst
@@ -41,13 +41,22 @@ supports only 1 SDO line.
Reference voltage
-----------------
-2 possible reference voltage sources are supported:
+ad7380-4
+~~~~~~~~
+
+ad7380-4 supports only an external reference voltage (2.5V to 3.3V). It must be
+declared in the device tree as ``refin-supply``.
+
+All other devices from ad738x family
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+All other devices from ad738x support 2 possible reference voltage sources:
- Internal reference (2.5V)
- External reference (2.5V to 3.3V)
The source is determined by the device tree. If ``refio-supply`` is present,
-then the external reference is used, else the internal reference is used.
+then it is used as external reference, else the internal reference is used.
Oversampling and resolution boost
---------------------------------
diff --git a/Documentation/mm/damon/maintainer-profile.rst b/Documentation/mm/damon/maintainer-profile.rst
index 2365c9a3c1f0..ce3e98458339 100644
--- a/Documentation/mm/damon/maintainer-profile.rst
+++ b/Documentation/mm/damon/maintainer-profile.rst
@@ -7,26 +7,26 @@ The DAMON subsystem covers the files that are listed in 'DATA ACCESS MONITOR'
section of 'MAINTAINERS' file.
The mailing lists for the subsystem are [email protected] and
[email protected]. Patches should be made against the mm-unstable `tree
-<https://git.kernel.org/akpm/mm/h/mm-unstable>` whenever possible and posted to
-the mailing lists.
[email protected]. Patches should be made against the `mm-unstable tree
+<https://git.kernel.org/akpm/mm/h/mm-unstable>`_ whenever possible and posted
+to the mailing lists.
SCM Trees
---------
There are multiple Linux trees for DAMON development. Patches under
development or testing are queued in `damon/next
-<https://git.kernel.org/sj/h/damon/next>` by the DAMON maintainer.
+<https://git.kernel.org/sj/h/damon/next>`_ by the DAMON maintainer.
Sufficiently reviewed patches will be queued in `mm-unstable
-<https://git.kernel.org/akpm/mm/h/mm-unstable>` by the memory management
+<https://git.kernel.org/akpm/mm/h/mm-unstable>`_ by the memory management
subsystem maintainer. After more sufficient tests, the patches will be queued
-in `mm-stable <https://git.kernel.org/akpm/mm/h/mm-stable>` , and finally
+in `mm-stable <https://git.kernel.org/akpm/mm/h/mm-stable>`_, and finally
pull-requested to the mainline by the memory management subsystem maintainer.
-Note again the patches for mm-unstable `tree
-<https://git.kernel.org/akpm/mm/h/mm-unstable>` are queued by the memory
+Note again the patches for `mm-unstable tree
+<https://git.kernel.org/akpm/mm/h/mm-unstable>`_ are queued by the memory
management subsystem maintainer. If the patches requires some patches in
-damon/next `tree <https://git.kernel.org/sj/h/damon/next>` which not yet merged
+`damon/next tree <https://git.kernel.org/sj/h/damon/next>`_ which not yet merged
in mm-unstable, please make sure the requirement is clearly specified.
Submit checklist addendum
@@ -37,25 +37,25 @@ When making DAMON changes, you should do below.
- Build changes related outputs including kernel and documents.
- Ensure the builds introduce no new errors or warnings.
- Run and ensure no new failures for DAMON `selftests
- <https://github.com/awslabs/damon-tests/blob/master/corr/run.sh#L49>` and
+ <https://github.com/damonitor/damon-tests/blob/master/corr/run.sh#L49>`_ and
`kunittests
- <https://github.com/awslabs/damon-tests/blob/master/corr/tests/kunit.sh>`.
+ <https://github.com/damonitor/damon-tests/blob/master/corr/tests/kunit.sh>`_.
Further doing below and putting the results will be helpful.
- Run `damon-tests/corr
- <https://github.com/awslabs/damon-tests/tree/master/corr>` for normal
+ <https://github.com/damonitor/damon-tests/tree/master/corr>`_ for normal
changes.
- Run `damon-tests/perf
- <https://github.com/awslabs/damon-tests/tree/master/perf>` for performance
+ <https://github.com/damonitor/damon-tests/tree/master/perf>`_ for performance
changes.
Key cycle dates
---------------
Patches can be sent anytime. Key cycle dates of the `mm-unstable
-<https://git.kernel.org/akpm/mm/h/mm-unstable>` and `mm-stable
-<https://git.kernel.org/akpm/mm/h/mm-stable>` trees depend on the memory
+<https://git.kernel.org/akpm/mm/h/mm-unstable>`_ and `mm-stable
+<https://git.kernel.org/akpm/mm/h/mm-stable>`_ trees depend on the memory
management subsystem maintainer.
Review cadence
@@ -72,13 +72,13 @@ Mailing tool
Like many other Linux kernel subsystems, DAMON uses the mailing lists
([email protected] and [email protected]) as the major communication
channel. There is a simple tool called `HacKerMaiL
-<https://github.com/damonitor/hackermail>` (``hkml``), which is for people who
+<https://github.com/damonitor/hackermail>`_ (``hkml``), which is for people who
are not very familiar with the mailing lists based communication. The tool
could be particularly helpful for DAMON community members since it is developed
and maintained by DAMON maintainer. The tool is also officially announced to
support DAMON and general Linux kernel development workflow.
-In other words, `hkml <https://github.com/damonitor/hackermail>` is a mailing
+In other words, `hkml <https://github.com/damonitor/hackermail>`_ is a mailing
tool for DAMON community, which DAMON maintainer is committed to support.
Please feel free to try and report issues or feature requests for the tool to
the maintainer.
@@ -98,8 +98,8 @@ slots, and attendees should reserve one of those at least 24 hours before the
time slot, by reaching out to the maintainer.
Schedules and available reservation time slots are available at the Google `doc
-<https://docs.google.com/document/d/1v43Kcj3ly4CYqmAkMaZzLiM2GEnWfgdGbZAH3mi2vpM/edit?usp=sharing>`.
+<https://docs.google.com/document/d/1v43Kcj3ly4CYqmAkMaZzLiM2GEnWfgdGbZAH3mi2vpM/edit?usp=sharing>`_.
There is also a public Google `calendar
-<https://calendar.google.com/calendar/u/0?cid=ZDIwOTA4YTMxNjc2MDQ3NTIyMmUzYTM5ZmQyM2U4NDA0ZGIwZjBiYmJlZGQxNDM0MmY4ZTRjOTE0NjdhZDRiY0Bncm91cC5jYWxlbmRhci5nb29nbGUuY29t>`
+<https://calendar.google.com/calendar/u/0?cid=ZDIwOTA4YTMxNjc2MDQ3NTIyMmUzYTM5ZmQyM2U4NDA0ZGIwZjBiYmJlZGQxNDM0MmY4ZTRjOTE0NjdhZDRiY0Bncm91cC5jYWxlbmRhci5nb29nbGUuY29t>`_
that has the events. Anyone can subscribe it. DAMON maintainer will also
provide periodic reminder to the mailing list ([email protected]).
diff --git a/Documentation/netlink/specs/mptcp_pm.yaml b/Documentation/netlink/specs/mptcp_pm.yaml
index 30d8342cacc8..dc190bf838fe 100644
--- a/Documentation/netlink/specs/mptcp_pm.yaml
+++ b/Documentation/netlink/specs/mptcp_pm.yaml
@@ -293,7 +293,6 @@ operations:
doc: Get endpoint information
attribute-set: attr
dont-validate: [ strict ]
- flags: [ uns-admin-perm ]
do: &get-addr-attrs
request:
attributes:
diff --git a/Documentation/networking/j1939.rst b/Documentation/networking/j1939.rst
index e4bd7aa1f5aa..544bad175aae 100644
--- a/Documentation/networking/j1939.rst
+++ b/Documentation/networking/j1939.rst
@@ -121,7 +121,7 @@ format, the Group Extension is set in the PS-field.
On the other hand, when using PDU1 format, the PS-field contains a so-called
Destination Address, which is _not_ part of the PGN. When communicating a PGN
-from user space to kernel (or vice versa) and PDU2 format is used, the PS-field
+from user space to kernel (or vice versa) and PDU1 format is used, the PS-field
of the PGN shall be set to zero. The Destination Address shall be set
elsewhere.
diff --git a/Documentation/networking/packet_mmap.rst b/Documentation/networking/packet_mmap.rst
index dca15d15feaf..02370786e77b 100644
--- a/Documentation/networking/packet_mmap.rst
+++ b/Documentation/networking/packet_mmap.rst
@@ -16,7 +16,7 @@ ii) transmit network traffic, or any other that needs raw
Howto can be found at:
- https://sites.google.com/site/packetmmap/
+ https://web.archive.org/web/20220404160947/https://sites.google.com/site/packetmmap/
Please send your comments to
- Ulisses Alonso Camaró <[email protected]>
@@ -166,7 +166,8 @@ As capture, each frame contains two parts::
/* bind socket to eth0 */
bind(this->socket, (struct sockaddr *)&my_addr, sizeof(struct sockaddr_ll));
- A complete tutorial is available at: https://sites.google.com/site/packetmmap/
+ A complete tutorial is available at:
+ https://web.archive.org/web/20220404160947/https://sites.google.com/site/packetmmap/
By default, the user should put data at::
diff --git a/Documentation/process/maintainer-soc.rst b/Documentation/process/maintainer-soc.rst
index 12637530d68f..fe9d8bcfbd2b 100644
--- a/Documentation/process/maintainer-soc.rst
+++ b/Documentation/process/maintainer-soc.rst
@@ -30,10 +30,13 @@ tree as a dedicated branch covering multiple subsystems.
The main SoC tree is housed on git.kernel.org:
https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git/
+Maintainers
+-----------
+
Clearly this is quite a wide range of topics, which no one person, or even
small group of people are capable of maintaining. Instead, the SoC subsystem
-is comprised of many submaintainers, each taking care of individual platforms
-and driver subdirectories.
+is comprised of many submaintainers (platform maintainers), each taking care of
+individual platforms and driver subdirectories.
In this regard, "platform" usually refers to a series of SoCs from a given
vendor, for example, Nvidia's series of Tegra SoCs. Many submaintainers operate
on a vendor level, responsible for multiple product lines. For several reasons,
@@ -43,14 +46,43 @@ MAINTAINERS file.
Most of these submaintainers have their own trees where they stage patches,
sending pull requests to the main SoC tree. These trees are usually, but not
-always, listed in MAINTAINERS. The main SoC maintainers can be reached via the
-alias [email protected] if there is no platform-specific maintainer, or if they
-are unresponsive.
+always, listed in MAINTAINERS.
What the SoC tree is not, however, is a location for architecture-specific code
changes. Each architecture has its own maintainers that are responsible for
architectural details, CPU errata and the like.
+Submitting Patches for Given SoC
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+All typical platform related patches should be sent via SoC submaintainers
+(platform-specific maintainers). This includes also changes to per-platform or
+shared defconfigs (scripts/get_maintainer.pl might not provide correct
+addresses in such case).
+
+Submitting Patches to the Main SoC Maintainers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The main SoC maintainers can be reached via the alias [email protected] only in
+following cases:
+
+1. There are no platform-specific maintainers.
+
+2. Platform-specific maintainers are unresponsive.
+
+3. Introducing a completely new SoC platform. Such new SoC work should be sent
+ first to common mailing lists, pointed out by scripts/get_maintainer.pl, for
+ community review. After positive community review, work should be sent to
+ [email protected] in one patchset containing new arch/foo/Kconfig entry, DTS
+ files, MAINTAINERS file entry and optionally initial drivers with their
+ Devicetree bindings. The MAINTAINERS file entry should list new
+ platform-specific maintainers, who are going to be responsible for handling
+ patches for the platform from now on.
+
+Note that the [email protected] is usually not the place to discuss the patches,
+thus work sent to this address should be already considered as acceptable by
+the community.
+
Information for (new) Submaintainers
------------------------------------
diff --git a/Documentation/rust/arch-support.rst b/Documentation/rust/arch-support.rst
index 750ff371570a..54be7ddf3e57 100644
--- a/Documentation/rust/arch-support.rst
+++ b/Documentation/rust/arch-support.rst
@@ -17,7 +17,7 @@ Architecture Level of support Constraints
============= ================ ==============================================
``arm64`` Maintained Little Endian only.
``loongarch`` Maintained \-
-``riscv`` Maintained ``riscv64`` only.
+``riscv`` Maintained ``riscv64`` and LLVM/Clang only.
``um`` Maintained \-
``x86`` Maintained ``x86_64`` only.
============= ================ ==============================================
diff --git a/Documentation/userspace-api/mseal.rst b/Documentation/userspace-api/mseal.rst
index 4132eec995a3..41102f74c5e2 100644
--- a/Documentation/userspace-api/mseal.rst
+++ b/Documentation/userspace-api/mseal.rst
@@ -23,177 +23,166 @@ applications can additionally seal security critical data at runtime.
A similar feature already exists in the XNU kernel with the
VM_FLAGS_PERMANENT flag [1] and on OpenBSD with the mimmutable syscall [2].
-User API
-========
-mseal()
------------
-The mseal() syscall has the following signature:
-
-``int mseal(void addr, size_t len, unsigned long flags)``
-
-**addr/len**: virtual memory address range.
-
-The address range set by ``addr``/``len`` must meet:
- - The start address must be in an allocated VMA.
- - The start address must be page aligned.
- - The end address (``addr`` + ``len``) must be in an allocated VMA.
- - no gap (unallocated memory) between start and end address.
-
-The ``len`` will be paged aligned implicitly by the kernel.
-
-**flags**: reserved for future use.
-
-**return values**:
-
-- ``0``: Success.
-
-- ``-EINVAL``:
- - Invalid input ``flags``.
- - The start address (``addr``) is not page aligned.
- - Address range (``addr`` + ``len``) overflow.
-
-- ``-ENOMEM``:
- - The start address (``addr``) is not allocated.
- - The end address (``addr`` + ``len``) is not allocated.
- - A gap (unallocated memory) between start and end address.
-
-- ``-EPERM``:
- - sealing is supported only on 64-bit CPUs, 32-bit is not supported.
-
-- For above error cases, users can expect the given memory range is
- unmodified, i.e. no partial update.
-
-- There might be other internal errors/cases not listed here, e.g.
- error during merging/splitting VMAs, or the process reaching the max
- number of supported VMAs. In those cases, partial updates to the given
- memory range could happen. However, those cases should be rare.
-
-**Blocked operations after sealing**:
- Unmapping, moving to another location, and shrinking the size,
- via munmap() and mremap(), can leave an empty space, therefore
- can be replaced with a VMA with a new set of attributes.
-
- Moving or expanding a different VMA into the current location,
- via mremap().
-
- Modifying a VMA via mmap(MAP_FIXED).
-
- Size expansion, via mremap(), does not appear to pose any
- specific risks to sealed VMAs. It is included anyway because
- the use case is unclear. In any case, users can rely on
- merging to expand a sealed VMA.
-
- mprotect() and pkey_mprotect().
-
- Some destructive madvice() behaviors (e.g. MADV_DONTNEED)
- for anonymous memory, when users don't have write permission to the
- memory. Those behaviors can alter region contents by discarding pages,
- effectively a memset(0) for anonymous memory.
-
- Kernel will return -EPERM for blocked operations.
-
- For blocked operations, one can expect the given address is unmodified,
- i.e. no partial update. Note, this is different from existing mm
- system call behaviors, where partial updates are made till an error is
- found and returned to userspace. To give an example:
-
- Assume following code sequence:
-
- - ptr = mmap(null, 8192, PROT_NONE);
- - munmap(ptr + 4096, 4096);
- - ret1 = mprotect(ptr, 8192, PROT_READ);
- - mseal(ptr, 4096);
- - ret2 = mprotect(ptr, 8192, PROT_NONE);
-
- ret1 will be -ENOMEM, the page from ptr is updated to PROT_READ.
-
- ret2 will be -EPERM, the page remains to be PROT_READ.
-
-**Note**:
-
-- mseal() only works on 64-bit CPUs, not 32-bit CPU.
-
-- users can call mseal() multiple times, mseal() on an already sealed memory
- is a no-action (not error).
-
-- munseal() is not supported.
-
-Use cases:
-==========
+SYSCALL
+=======
+mseal syscall signature
+-----------------------
+ ``int mseal(void \* addr, size_t len, unsigned long flags)``
+
+ **addr**/**len**: virtual memory address range.
+ The address range set by **addr**/**len** must meet:
+ - The start address must be in an allocated VMA.
+ - The start address must be page aligned.
+ - The end address (**addr** + **len**) must be in an allocated VMA.
+ - no gap (unallocated memory) between start and end address.
+
+ The ``len`` will be paged aligned implicitly by the kernel.
+
+ **flags**: reserved for future use.
+
+ **Return values**:
+ - **0**: Success.
+ - **-EINVAL**:
+ * Invalid input ``flags``.
+ * The start address (``addr``) is not page aligned.
+ * Address range (``addr`` + ``len``) overflow.
+ - **-ENOMEM**:
+ * The start address (``addr``) is not allocated.
+ * The end address (``addr`` + ``len``) is not allocated.
+ * A gap (unallocated memory) between start and end address.
+ - **-EPERM**:
+ * sealing is supported only on 64-bit CPUs, 32-bit is not supported.
+
+ **Note about error return**:
+ - For above error cases, users can expect the given memory range is
+ unmodified, i.e. no partial update.
+ - There might be other internal errors/cases not listed here, e.g.
+ error during merging/splitting VMAs, or the process reaching the maximum
+ number of supported VMAs. In those cases, partial updates to the given
+ memory range could happen. However, those cases should be rare.
+
+ **Architecture support**:
+ mseal only works on 64-bit CPUs, not 32-bit CPUs.
+
+ **Idempotent**:
+ users can call mseal multiple times. mseal on an already sealed memory
+ is a no-action (not error).
+
+ **no munseal**
+ Once mapping is sealed, it can't be unsealed. The kernel should never
+ have munseal, this is consistent with other sealing feature, e.g.
+ F_SEAL_SEAL for file.
+
+Blocked mm syscall for sealed mapping
+-------------------------------------
+ It might be important to note: **once the mapping is sealed, it will
+ stay in the process's memory until the process terminates**.
+
+ Example::
+
+ *ptr = mmap(0, 4096, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+ rc = mseal(ptr, 4096, 0);
+ /* munmap will fail */
+ rc = munmap(ptr, 4096);
+ assert(rc < 0);
+
+ Blocked mm syscall:
+ - munmap
+ - mmap
+ - mremap
+ - mprotect and pkey_mprotect
+ - some destructive madvise behaviors: MADV_DONTNEED, MADV_FREE,
+ MADV_DONTNEED_LOCKED, MADV_FREE, MADV_DONTFORK, MADV_WIPEONFORK
+
+ The first set of syscalls to block is munmap, mremap, mmap. They can
+ either leave an empty space in the address space, therefore allowing
+ replacement with a new mapping with new set of attributes, or can
+ overwrite the existing mapping with another mapping.
+
+ mprotect and pkey_mprotect are blocked because they changes the
+ protection bits (RWX) of the mapping.
+
+ Certain destructive madvise behaviors, specifically MADV_DONTNEED,
+ MADV_FREE, MADV_DONTNEED_LOCKED, and MADV_WIPEONFORK, can introduce
+ risks when applied to anonymous memory by threads lacking write
+ permissions. Consequently, these operations are prohibited under such
+ conditions. The aforementioned behaviors have the potential to modify
+ region contents by discarding pages, effectively performing a memset(0)
+ operation on the anonymous memory.
+
+ Kernel will return -EPERM for blocked syscalls.
+
+ When blocked syscall return -EPERM due to sealing, the memory regions may
+ or may not be changed, depends on the syscall being blocked:
+
+ - munmap: munmap is atomic. If one of VMAs in the given range is
+ sealed, none of VMAs are updated.
+ - mprotect, pkey_mprotect, madvise: partial update might happen, e.g.
+ when mprotect over multiple VMAs, mprotect might update the beginning
+ VMAs before reaching the sealed VMA and return -EPERM.
+ - mmap and mremap: undefined behavior.
+
+Use cases
+=========
- glibc:
The dynamic linker, during loading ELF executables, can apply sealing to
- non-writable memory segments.
-
-- Chrome browser: protect some security sensitive data-structures.
+ mapping segments.
-Notes on which memory to seal:
-==============================
+- Chrome browser: protect some security sensitive data structures.
-It might be important to note that sealing changes the lifetime of a mapping,
-i.e. the sealed mapping won’t be unmapped till the process terminates or the
-exec system call is invoked. Applications can apply sealing to any virtual
-memory region from userspace, but it is crucial to thoroughly analyze the
-mapping's lifetime prior to apply the sealing.
+When not to use mseal
+=====================
+Applications can apply sealing to any virtual memory region from userspace,
+but it is *crucial to thoroughly analyze the mapping's lifetime* prior to
+apply the sealing. This is because the sealed mapping *won’t be unmapped*
+until the process terminates or the exec system call is invoked.
For example:
+ - aio/shm
+ aio/shm can call mmap and munmap on behalf of userspace, e.g.
+ ksys_shmdt() in shm.c. The lifetimes of those mapping are not tied to
+ the lifetime of the process. If those memories are sealed from userspace,
+ then munmap will fail, causing leaks in VMA address space during the
+ lifetime of the process.
+
+ - ptr allocated by malloc (heap)
+ Don't use mseal on the memory ptr return from malloc().
+ malloc() is implemented by allocator, e.g. by glibc. Heap manager might
+ allocate a ptr from brk or mapping created by mmap.
+ If an app calls mseal on a ptr returned from malloc(), this can affect
+ the heap manager's ability to manage the mappings; the outcome is
+ non-deterministic.
+
+ Example::
+
+ ptr = malloc(size);
+ /* don't call mseal on ptr return from malloc. */
+ mseal(ptr, size);
+ /* free will success, allocator can't shrink heap lower than ptr */
+ free(ptr);
+
+mseal doesn't block
+===================
+In a nutshell, mseal blocks certain mm syscall from modifying some of VMA's
+attributes, such as protection bits (RWX). Sealed mappings doesn't mean the
+memory is immutable.
-- aio/shm
-
- aio/shm can call mmap()/munmap() on behalf of userspace, e.g. ksys_shmdt() in
- shm.c. The lifetime of those mapping are not tied to the lifetime of the
- process. If those memories are sealed from userspace, then munmap() will fail,
- causing leaks in VMA address space during the lifetime of the process.
-
-- Brk (heap)
-
- Currently, userspace applications can seal parts of the heap by calling
- malloc() and mseal().
- let's assume following calls from user space:
-
- - ptr = malloc(size);
- - mprotect(ptr, size, RO);
- - mseal(ptr, size);
- - free(ptr);
-
- Technically, before mseal() is added, the user can change the protection of
- the heap by calling mprotect(RO). As long as the user changes the protection
- back to RW before free(), the memory range can be reused.
-
- Adding mseal() into the picture, however, the heap is then sealed partially,
- the user can still free it, but the memory remains to be RO. If the address
- is re-used by the heap manager for another malloc, the process might crash
- soon after. Therefore, it is important not to apply sealing to any memory
- that might get recycled.
-
- Furthermore, even if the application never calls the free() for the ptr,
- the heap manager may invoke the brk system call to shrink the size of the
- heap. In the kernel, the brk-shrink will call munmap(). Consequently,
- depending on the location of the ptr, the outcome of brk-shrink is
- nondeterministic.
-
-
-Additional notes:
-=================
As Jann Horn pointed out in [3], there are still a few ways to write
-to RO memory, which is, in a way, by design. Those cases are not covered
-by mseal(). If applications want to block such cases, sandbox tools (such as
-seccomp, LSM, etc) might be considered.
+to RO memory, which is, in a way, by design. And those could be blocked
+by different security measures.
Those cases are:
-- Write to read-only memory through /proc/self/mem interface.
-- Write to read-only memory through ptrace (such as PTRACE_POKETEXT).
-- userfaultfd.
+ - Write to read-only memory through /proc/self/mem interface (FOLL_FORCE).
+ - Write to read-only memory through ptrace (such as PTRACE_POKETEXT).
+ - userfaultfd.
The idea that inspired this patch comes from Stephen Röttger’s work in V8
CFI [4]. Chrome browser in ChromeOS will be the first user of this API.
-Reference:
-==========
-[1] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b9f69dbd3c8c3fd30a/osfmk/mach/vm_statistics.h#L274
-
-[2] https://man.openbsd.org/mimmutable.2
-
-[3] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426FkcgnfUGLvA@mail.gmail.com
-
-[4] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgeaRHo/edit#heading=h.bvaojj9fu6hc
+Reference
+=========
+- [1] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b9f69dbd3c8c3fd30a/osfmk/mach/vm_statistics.h#L274
+- [2] https://man.openbsd.org/mimmutable.2
+- [3] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426FkcgnfUGLvA@mail.gmail.com
+- [4] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXgeaRHo/edit#heading=h.bvaojj9fu6hc
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index e32471977d0a..edc070c6e19b 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8098,13 +8098,15 @@ KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS By default, KVM emulates MONITOR/MWAIT (if
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT is
disabled.
-KVM_X86_QUIRK_SLOT_ZAP_ALL By default, KVM invalidates all SPTEs in
- fast way for memslot deletion when VM type
- is KVM_X86_DEFAULT_VM.
- When this quirk is disabled or when VM type
- is other than KVM_X86_DEFAULT_VM, KVM zaps
- only leaf SPTEs that are within the range of
- the memslot being deleted.
+KVM_X86_QUIRK_SLOT_ZAP_ALL By default, for KVM_X86_DEFAULT_VM VMs, KVM
+ invalidates all SPTEs in all memslots and
+ address spaces when a memslot is deleted or
+ moved. When this quirk is disabled (or the
+ VM type isn't KVM_X86_DEFAULT_VM), KVM only
+ ensures the backing memory of the deleted
+ or moved memslot isn't reachable, i.e KVM
+ _may_ invalidate only SPTEs related to the
+ memslot.
=================================== ============================================
7.32 KVM_CAP_MAX_VCPU_ID
diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 20a9a37d1cdd..1bedd56e2fe3 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -136,7 +136,7 @@ For direct sp, we can easily avoid it since the spte of direct sp is fixed
to gfn. For indirect sp, we disabled fast page fault for simplicity.
A solution for indirect sp could be to pin the gfn, for example via
-kvm_vcpu_gfn_to_pfn_atomic, before the cmpxchg. After the pinning:
+gfn_to_pfn_memslot_atomic, before the cmpxchg. After the pinning:
- We have held the refcount of pfn; that means the pfn can not be freed and
be reused for another gfn.