16 files changed, 1145 insertions, 117 deletions
diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst
index a6ba95fbaa9f..ce63be6d64ad 100644
--- a/Documentation/admin-guide/LSM/index.rst
+++ b/Documentation/admin-guide/LSM/index.rst
@@ -47,3 +47,4 @@ subdirectories.
    tomoyo
    Yama
    SafeSetID
+   ipe
diff --git a/Documentation/admin-guide/LSM/ipe.rst b/Documentation/admin-guide/LSM/ipe.rst
new file mode 100644
index 000000000000..f38e641df0e9
--- /dev/null
+++ b/Documentation/admin-guide/LSM/ipe.rst
@@ -0,0 +1,790 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Integrity Policy Enforcement (IPE)
+==================================
+
+.. NOTE::
+
+   This is the documentation for admins, system builders, or individuals
+   attempting to use IPE. If you're looking for more developer-focused
+   documentation about IPE please see :doc:`the design docs </security/ipe>`.
+
+Overview
+--------
+
+Integrity Policy Enforcement (IPE) is a Linux Security Module that takes a
+complementary approach to access control. Unlike traditional access control
+mechanisms that rely on labels and paths for decision-making, IPE focuses
+on the immutable security properties inherent to system components. These
+properties are fundamental attributes or features of a system component
+that cannot be altered, ensuring a consistent and reliable basis for
+security decisions.
+
+To elaborate, in the context of IPE, system components primarily refer to
+files or the devices these files reside on. However, this is just a
+starting point. The concept of system components is flexible and can be
+extended to include new elements as the system evolves. The immutable
+properties include the origin of a file, which remains constant and
+unchangeable over time. For example, IPE policies can be crafted to trust
+files originating from the initramfs. Since initramfs is typically verified
+by the bootloader, its files are deemed trustworthy; "file is from
+initramfs" becomes an immutable property under IPE's consideration.
+
+The immutable property concept extends to the security features enabled on
+a file's origin, such as dm-verity or fs-verity, which provide a layer of
+integrity and trust. For example, IPE allows the definition of policies
+that trust files from a dm-verity protected device. dm-verity ensures the
+integrity of an entire device by providing a verifiable and immutable state
+of its contents. Similarly, fs-verity offers filesystem-level integrity
+checks, allowing IPE to enforce policies that trust files protected by
+fs-verity. These two features cannot be turned off once established, so
+they are considered immutable properties. These examples demonstrate how
+IPE leverages immutable properties, such as a file's origin and its
+integrity protection mechanisms, to make access control decisions.
+
+For the IPE policy, specifically, it grants the ability to enforce
+stringent access controls by assessing security properties against
+reference values defined within the policy. This assessment can be based on
+the existence of a security property (e.g., verifying if a file originates
+from initramfs) or evaluating the internal state of an immutable security
+property. The latter includes checking the roothash of a dm-verity
+protected device, determining whether dm-verity possesses a valid
+signature, assessing the digest of a fs-verity protected file, or
+determining whether fs-verity possesses a valid built-in signature. This
+nuanced approach to policy enforcement enables a highly secure and
+customizable system defense mechanism, tailored to specific security
+requirements and trust models.
+
+To enable IPE, ensure that ``CONFIG_SECURITY_IPE`` (under
+:menuselection:`Security -> Integrity Policy Enforcement (IPE)`) config
+option is enabled.
+
+Use Cases
+---------
+
+IPE works best in fixed-function devices: devices in which their purpose
+is clearly defined and not supposed to be changed (e.g. network firewall
+device in a data center, an IoT device, etcetera), where all software and
+configuration is built and provisioned by the system owner.
+
+IPE is a long-way off for use in general-purpose computing: the Linux
+community as a whole tends to follow a decentralized trust model (known as
+the web of trust), which IPE has no support for it yet. Instead, IPE
+supports PKI (public key infrastructure), which generally designates a
+set of trusted entities that provide a measure of absolute trust.
+
+Additionally, while most packages are signed today, the files inside
+the packages (for instance, the executables), tend to be unsigned. This
+makes it difficult to utilize IPE in systems where a package manager is
+expected to be functional, without major changes to the package manager
+and ecosystem behind it.
+
+The digest_cache LSM [#digest_cache_lsm]_ is a system that when combined with IPE,
+could be used to enable and support general-purpose computing use cases.
+
+Known Limitations
+-----------------
+
+IPE cannot verify the integrity of anonymous executable memory, such as
+the trampolines created by gcc closures and libffi (<3.4.2), or JIT'd code.
+Unfortunately, as this is dynamically generated code, there is no way
+for IPE to ensure the integrity of this code to form a trust basis.
+
+IPE cannot verify the integrity of programs written in interpreted
+languages when these scripts are invoked by passing these program files
+to the interpreter. This is because the way interpreters execute these
+files; the scripts themselves are not evaluated as executable code
+through one of IPE's hooks, but they are merely text files that are read
+(as opposed to compiled executables) [#interpreters]_.
+
+Threat Model
+------------
+
+IPE specifically targets the risk of tampering with user-space executable
+code after the kernel has initially booted, including the kernel modules
+loaded from userspace via ``modprobe`` or ``insmod``.
+
+To illustrate, consider a scenario where an untrusted binary, possibly
+malicious, is downloaded along with all necessary dependencies, including a
+loader and libc. The primary function of IPE in this context is to prevent
+the execution of such binaries and their dependencies.
+
+IPE achieves this by verifying the integrity and authenticity of all
+executable code before allowing them to run. It conducts a thorough
+check to ensure that the code's integrity is intact and that they match an
+authorized reference value (digest, signature, etc) as per the defined
+policy. If a binary does not pass this verification process, either
+because its integrity has been compromised or it does not meet the
+authorization criteria, IPE will deny its execution. Additionally, IPE
+generates audit logs which may be utilized to detect and analyze failures
+resulting from policy violation.
+
+Tampering threat scenarios include modification or replacement of
+executable code by a range of actors including:
+
+-  Actors with physical access to the hardware
+-  Actors with local network access to the system
+-  Actors with access to the deployment system
+-  Compromised internal systems under external control
+-  Malicious end users of the system
+-  Compromised end users of the system
+-  Remote (external) compromise of the system
+
+IPE does not mitigate threats arising from malicious but authorized
+developers (with access to a signing certificate), or compromised
+developer tools used by them (i.e. return-oriented programming attacks).
+Additionally, IPE draws hard security boundary between userspace and
+kernelspace. As a result, kernel-level exploits are considered outside
+the scope of IPE and mitigation is left to other mechanisms.
+
+Policy
+------
+
+IPE policy is a plain-text [#devdoc]_ policy composed of multiple statements
+over several lines. There is one required line, at the top of the
+policy, indicating the policy name, and the policy version, for
+instance::
+
+   policy_name=Ex_Policy policy_version=0.0.0
+
+The policy name is a unique key identifying this policy in a human
+readable name. This is used to create nodes under securityfs as well as
+uniquely identify policies to deploy new policies vs update existing
+policies.
+
+The policy version indicates the current version of the policy (NOT the
+policy syntax version). This is used to prevent rollback of policy to
+potentially insecure previous versions of the policy.
+
+The next portion of IPE policy are rules. Rules are formed by key=value
+pairs, known as properties. IPE rules require two properties: ``action``,
+which determines what IPE does when it encounters a match against the
+rule, and ``op``, which determines when the rule should be evaluated.
+The ordering is significant, a rule must start with ``op``, and end with
+``action``. Thus, a minimal rule is::
+
+   op=EXECUTE action=ALLOW
+
+This example will allow any execution. Additional properties are used to
+assess immutable security properties about the files being evaluated.
+These properties are intended to be descriptions of systems within the
+kernel that can provide a measure of integrity verification, such that IPE
+can determine the trust of the resource based on the value of the property.
+
+Rules are evaluated top-to-bottom. As a result, any revocation rules,
+or denies should be placed early in the file to ensure that these rules
+are evaluated before a rule with ``action=ALLOW``.
+
+IPE policy supports comments. The character '#' will function as a
+comment, ignoring all characters to the right of '#' until the newline.
+
+The default behavior of IPE evaluations can also be expressed in policy,
+through the ``DEFAULT`` statement. This can be done at a global level,
+or a per-operation level::
+
+   # Global
+   DEFAULT action=ALLOW
+
+   # Operation Specific
+   DEFAULT op=EXECUTE action=ALLOW
+
+A default must be set for all known operations in IPE. If you want to
+preserve older policies being compatible with newer kernels that can introduce
+new operations, set a global default of ``ALLOW``, then override the
+defaults on a per-operation basis (as above).
+
+With configurable policy-based LSMs, there's several issues with
+enforcing the configurable policies at startup, around reading and
+parsing the policy:
+
+1. The kernel *should* not read files from userspace, so directly reading
+   the policy file is prohibited.
+2. The kernel command line has a character limit, and one kernel module
+   should not reserve the entire character limit for its own
+   configuration.
+3. There are various boot loaders in the kernel ecosystem, so handing
+   off a memory block would be costly to maintain.
+
+As a result, IPE has addressed this problem through a concept of a "boot
+policy". A boot policy is a minimal policy which is compiled into the
+kernel. This policy is intended to get the system to a state where
+userspace is set up and ready to receive commands, at which point a more
+complex policy can be deployed via securityfs. The boot policy can be
+specified via ``SECURITY_IPE_BOOT_POLICY`` config option, which accepts
+a path to a plain-text version of the IPE policy to apply. This policy
+will be compiled into the kernel. If not specified, IPE will be disabled
+until a policy is deployed and activated through securityfs.
+
+Deploying Policies
+~~~~~~~~~~~~~~~~~~
+
+Policies can be deployed from userspace through securityfs. These policies
+are signed through the PKCS#7 message format to enforce some level of
+authorization of the policies (prohibiting an attacker from gaining
+unconstrained root, and deploying an "allow all" policy). These
+policies must be signed by a certificate that chains to the
+``SYSTEM_TRUSTED_KEYRING``. With openssl, the policy can be signed by::
+
+   openssl smime -sign \
+      -in "$MY_POLICY" \
+      -signer "$MY_CERTIFICATE" \
+      -inkey "$MY_PRIVATE_KEY" \
+      -noattr \
+      -nodetach \
+      -nosmimecap \
+      -outform der \
+      -out "$MY_POLICY.p7b"
+
+Deploying the policies is done through securityfs, through the
+``new_policy`` node. To deploy a policy, simply cat the file into the
+securityfs node::
+
+   cat "$MY_POLICY.p7b" > /sys/kernel/security/ipe/new_policy
+
+Upon success, this will create one subdirectory under
+``/sys/kernel/security/ipe/policies/``. The subdirectory will be the
+``policy_name`` field of the policy deployed, so for the example above,
+the directory will be ``/sys/kernel/security/ipe/policies/Ex_Policy``.
+Within this directory, there will be seven files: ``pkcs7``, ``policy``,
+``name``, ``version``, ``active``, ``update``, and ``delete``.
+
+The ``pkcs7`` file is read-only. Reading it returns the raw PKCS#7 data
+that was provided to the kernel, representing the policy. If the policy being
+read is the boot policy, this will return ``ENOENT``, as it is not signed.
+
+The ``policy`` file is read only. Reading it returns the PKCS#7 inner
+content of the policy, which will be the plain text policy.
+
+The ``active`` file is used to set a policy as the currently active policy.
+This file is rw, and accepts a value of ``"1"`` to set the policy as active.
+Since only a single policy can be active at one time, all other policies
+will be marked inactive. The policy being marked active must have a policy
+version greater or equal to the currently-running version.
+
+The ``update`` file is used to update a policy that is already present
+in the kernel. This file is write-only and accepts a PKCS#7 signed
+policy. Two checks will always be performed on this policy: First, the
+``policy_names`` must match with the updated version and the existing
+version. Second the updated policy must have a policy version greater than
+or equal to the currently-running version. This is to prevent rollback attacks.
+
+The ``delete`` file is used to remove a policy that is no longer needed.
+This file is write-only and accepts a value of ``1`` to delete the policy.
+On deletion, the securityfs node representing the policy will be removed.
+However, delete the current active policy is not allowed and will return
+an operation not permitted error.
+
+Similarly, writing to both ``update`` and ``new_policy`` could result in
+bad message(policy syntax error) or file exists error. The latter error happens
+when trying to deploy a policy with a ``policy_name`` while the kernel already
+has a deployed policy with the same ``policy_name``.
+
+Deploying a policy will *not* cause IPE to start enforcing the policy. IPE will
+only enforce the policy marked active. Note that only one policy can be active
+at a time.
+
+Once deployment is successful, the policy can be activated, by writing file
+``/sys/kernel/security/ipe/policies/$policy_name/active``.
+For example, the ``Ex_Policy`` can be activated by::
+
+   echo 1 > "/sys/kernel/security/ipe/policies/Ex_Policy/active"
+
+From above point on, ``Ex_Policy`` is now the enforced policy on the
+system.
+
+IPE also provides a way to delete policies. This can be done via the
+``delete`` securityfs node,
+``/sys/kernel/security/ipe/policies/$policy_name/delete``.
+Writing ``1`` to that file deletes the policy::
+
+   echo 1 > "/sys/kernel/security/ipe/policies/$policy_name/delete"
+
+There is only one requirement to delete a policy: the policy being deleted
+must be inactive.
+
+.. NOTE::
+
+   If a traditional MAC system is enabled (SELinux, apparmor, smack), all
+   writes to ipe's securityfs nodes require ``CAP_MAC_ADMIN``.
+
+Modes
+~~~~~
+
+IPE supports two modes of operation: permissive (similar to SELinux's
+permissive mode) and enforced. In permissive mode, all events are
+checked and policy violations are logged, but the policy is not really
+enforced. This allows users to test policies before enforcing them.
+
+The default mode is enforce, and can be changed via the kernel command
+line parameter ``ipe.enforce=(0|1)``, or the securityfs node
+``/sys/kernel/security/ipe/enforce``.
+
+.. NOTE::
+
+   If a traditional MAC system is enabled (SELinux, apparmor, smack, etcetera),
+   all writes to ipe's securityfs nodes require ``CAP_MAC_ADMIN``.
+
+Audit Events
+~~~~~~~~~~~~
+
+1420 AUDIT_IPE_ACCESS
+^^^^^^^^^^^^^^^^^^^^^
+Event Examples::
+
+   type=1420 audit(1653364370.067:61): ipe_op=EXECUTE ipe_hook=MMAP enforcing=1 pid=2241 comm="ld-linux.so" path="/deny/lib/libc.so.6" dev="sda2" ino=14549020 rule="DEFAULT action=DENY"
+   type=1300 audit(1653364370.067:61): SYSCALL arch=c000003e syscall=9 success=no exit=-13 a0=7f1105a28000 a1=195000 a2=5 a3=812 items=0 ppid=2219 pid=2241 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm="ld-linux.so" exe="/tmp/ipe-test/lib/ld-linux.so" subj=unconfined key=(null)
+   type=1327 audit(1653364370.067:61): 707974686F6E3300746573742F6D61696E2E7079002D6E00
+
+   type=1420 audit(1653364735.161:64): ipe_op=EXECUTE ipe_hook=MMAP enforcing=1 pid=2472 comm="mmap_test" path=? dev=? ino=? rule="DEFAULT action=DENY"
+   type=1300 audit(1653364735.161:64): SYSCALL arch=c000003e syscall=9 success=no exit=-13 a0=0 a1=1000 a2=4 a3=21 items=0 ppid=2219 pid=2472 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm="mmap_test" exe="/root/overlake_test/upstream_test/vol_fsverity/bin/mmap_test" subj=unconfined key=(null)
+   type=1327 audit(1653364735.161:64): 707974686F6E3300746573742F6D61696E2E7079002D6E00
+
+This event indicates that IPE made an access control decision; the IPE
+specific record (1420) is always emitted in conjunction with a
+``AUDITSYSCALL`` record.
+
+Determining whether IPE is in permissive or enforced mode can be derived
+from ``success`` property and exit code of the ``AUDITSYSCALL`` record.
+
+
+Field descriptions:
+
++-----------+------------+-----------+---------------------------------------------------------------------------------+
+| Field     | Value Type | Optional? | Description of Value                                                            |
++===========+============+===========+=================================================================================+
+| ipe_op    | string     | No        | The IPE operation name associated with the log                                  |
++-----------+------------+-----------+---------------------------------------------------------------------------------+
+| ipe_hook  | string     | No        | The name of the LSM hook that triggered the IPE event                           |
++-----------+------------+-----------+---------------------------------------------------------------------------------+
+| enforcing | integer    | No        | The current IPE enforcing state 1 is in enforcing mode, 0 is in permissive mode |
++-----------+------------+-----------+---------------------------------------------------------------------------------+
+| pid       | integer    | No        | The pid of the process that triggered the IPE event.                            |
++-----------+------------+-----------+---------------------------------------------------------------------------------+
+| comm      | string     | No        | The command line program name of the process that triggered the IPE event       |
++-----------+------------+-----------+---------------------------------------------------------------------------------+
+| path      | string     | Yes       | The absolute path to the evaluated file                                         |
++-----------+------------+-----------+---------------------------------------------------------------------------------+
+| ino       | integer    | Yes       | The inode number of the evaluated file                                          |
++-----------+------------+-----------+---------------------------------------------------------------------------------+
+| dev       | string     | Yes       | The device name of the evaluated file, e.g. vda                                 |
++-----------+------------+-----------+---------------------------------------------------------------------------------+
+| rule      | string     | No        | The matched policy rule                                                         |
++-----------+------------+-----------+---------------------------------------------------------------------------------+
+
+1421 AUDIT_IPE_CONFIG_CHANGE
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Event Example::
+
+   type=1421 audit(1653425583.136:54): old_active_pol_name="Allow_All" old_active_pol_version=0.0.0 old_policy_digest=sha256:E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855 new_active_pol_name="boot_verified" new_active_pol_version=0.0.0 new_policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F26765076DD8EED7B8F4DB auid=4294967295 ses=4294967295 lsm=ipe res=1
+   type=1300 audit(1653425583.136:54): SYSCALL arch=c000003e syscall=1 success=yes exit=2 a0=3 a1=5596fcae1fb0 a2=2 a3=2 items=0 ppid=184 pid=229 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=4294967295 comm="python3" exe="/usr/bin/python3.10" key=(null)
+   type=1327 audit(1653425583.136:54): PROCTITLE proctitle=707974686F6E3300746573742F6D61696E2E7079002D66002E2
+
+This event indicates that IPE switched the active poliy from one to another
+along with the version and the hash digest of the two policies.
+Note IPE can only have one policy active at a time, all access decision
+evaluation is based on the current active policy.
+The normal procedure to deploy a new policy is loading the policy to deploy
+into the kernel first, then switch the active policy to it.
+
+This record will always be emitted in conjunction with a ``AUDITSYSCALL`` record for the ``write`` syscall.
+
+Field descriptions:
+
++------------------------+------------+-----------+---------------------------------------------------+
+| Field                  | Value Type | Optional? | Description of Value                              |
++========================+============+===========+===================================================+
+| old_active_pol_name    | string     | Yes       | The name of previous active policy                |
++------------------------+------------+-----------+---------------------------------------------------+
+| old_active_pol_version | string     | Yes       | The version of previous active policy             |
++------------------------+------------+-----------+---------------------------------------------------+
+| old_policy_digest      | string     | Yes       | The hash of previous active policy                |
++------------------------+------------+-----------+---------------------------------------------------+
+| new_active_pol_name    | string     | No        | The name of current active policy                 |
++------------------------+------------+-----------+---------------------------------------------------+
+| new_active_pol_version | string     | No        | The version of current active policy              |
++------------------------+------------+-----------+---------------------------------------------------+
+| new_policy_digest      | string     | No        | The hash of current active policy                 |
++------------------------+------------+-----------+---------------------------------------------------+
+| auid                   | integer    | No        | The login user ID                                 |
++------------------------+------------+-----------+---------------------------------------------------+
+| ses                    | integer    | No        | The login session ID                              |
++------------------------+------------+-----------+---------------------------------------------------+
+| lsm                    | string     | No        | The lsm name associated with the event            |
++------------------------+------------+-----------+---------------------------------------------------+
+| res                    | integer    | No        | The result of the audited operation(success/fail) |
++------------------------+------------+-----------+---------------------------------------------------+
+
+1422 AUDIT_IPE_POLICY_LOAD
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Event Example::
+
+   type=1422 audit(1653425529.927:53): policy_name="boot_verified" policy_version=0.0.0 policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F26765076DD8EED7B8F4DB auid=4294967295 ses=4294967295 lsm=ipe res=1
+   type=1300 audit(1653425529.927:53): arch=c000003e syscall=1 success=yes exit=2567 a0=3 a1=5596fcae1fb0 a2=a07 a3=2 items=0 ppid=184 pid=229 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=4294967295 comm="python3" exe="/usr/bin/python3.10" key=(null)
+   type=1327 audit(1653425529.927:53): PROCTITLE proctitle=707974686F6E3300746573742F6D61696E2E7079002D66002E2E
+
+This record indicates a new policy has been loaded into the kernel with the policy name, policy version and policy hash.
+
+This record will always be emitted in conjunction with a ``AUDITSYSCALL`` record for the ``write`` syscall.
+
+Field descriptions:
+
++----------------+------------+-----------+---------------------------------------------------+
+| Field          | Value Type | Optional? | Description of Value                              |
++================+============+===========+===================================================+
+| policy_name    | string     | No        | The policy_name                                   |
++----------------+------------+-----------+---------------------------------------------------+
+| policy_version | string     | No        | The policy_version                                |
++----------------+------------+-----------+---------------------------------------------------+
+| policy_digest  | string     | No        | The policy hash                                   |
++----------------+------------+-----------+---------------------------------------------------+
+| auid           | integer    | No        | The login user ID                                 |
++----------------+------------+-----------+---------------------------------------------------+
+| ses            | integer    | No        | The login session ID                              |
++----------------+------------+-----------+---------------------------------------------------+
+| lsm            | string     | No        | The lsm name associated with the event            |
++----------------+------------+-----------+---------------------------------------------------+
+| res            | integer    | No        | The result of the audited operation(success/fail) |
++----------------+------------+-----------+---------------------------------------------------+
+
+
+1404 AUDIT_MAC_STATUS
+^^^^^^^^^^^^^^^^^^^^^
+
+Event Examples::
+
+   type=1404 audit(1653425689.008:55): enforcing=0 old_enforcing=1 auid=4294967295 ses=4294967295 enabled=1 old-enabled=1 lsm=ipe res=1
+   type=1300 audit(1653425689.008:55): arch=c000003e syscall=1 success=yes exit=2 a0=1 a1=55c1065e5c60 a2=2 a3=0 items=0 ppid=405 pid=441 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=)
+   type=1327 audit(1653425689.008:55): proctitle="-bash"
+
+   type=1404 audit(1653425689.008:55): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295 enabled=1 old-enabled=1 lsm=ipe res=1
+   type=1300 audit(1653425689.008:55): arch=c000003e syscall=1 success=yes exit=2 a0=1 a1=55c1065e5c60 a2=2 a3=0 items=0 ppid=405 pid=441 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=)
+   type=1327 audit(1653425689.008:55): proctitle="-bash"
+
+This record will always be emitted in conjunction with a ``AUDITSYSCALL`` record for the ``write`` syscall.
+
+Field descriptions:
+
++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+
+| Field         | Value Type | Optional? | Description of Value                                                                            |
++===============+============+===========+=================================================================================================+
+| enforcing     | integer    | No        | The enforcing state IPE is being switched to, 1 is in enforcing mode, 0 is in permissive mode   |
++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+
+| old_enforcing | integer    | No        | The enforcing state IPE is being switched from, 1 is in enforcing mode, 0 is in permissive mode |
++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+
+| auid          | integer    | No        | The login user ID                                                                               |
++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+
+| ses           | integer    | No        | The login session ID                                                                            |
++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+
+| enabled       | integer    | No        | The new TTY audit enabled setting                                                               |
++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+
+| old-enabled   | integer    | No        | The old TTY audit enabled setting                                                               |
++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+
+| lsm           | string     | No        | The lsm name associated with the event                                                          |
++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+
+| res           | integer    | No        | The result of the audited operation(success/fail)                                               |
++---------------+------------+-----------+-------------------------------------------------------------------------------------------------+
+
+
+Success Auditing
+^^^^^^^^^^^^^^^^
+
+IPE supports success auditing. When enabled, all events that pass IPE
+policy and are not blocked will emit an audit event. This is disabled by
+default, and can be enabled via the kernel command line
+``ipe.success_audit=(0|1)`` or
+``/sys/kernel/security/ipe/success_audit`` securityfs file.
+
+This is *very* noisy, as IPE will check every userspace binary on the
+system, but is useful for debugging policies.
+
+.. NOTE::
+
+   If a traditional MAC system is enabled (SELinux, apparmor, smack, etcetera),
+   all writes to ipe's securityfs nodes require ``CAP_MAC_ADMIN``.
+
+Properties
+----------
+
+As explained above, IPE properties are ``key=value`` pairs expressed in IPE
+policy. Two properties are built-into the policy parser: 'op' and 'action'.
+The other properties are used to restrict immutable security properties
+about the files being evaluated. Currently those properties are:
+'``boot_verified``', '``dmverity_signature``', '``dmverity_roothash``',
+'``fsverity_signature``', '``fsverity_digest``'. A description of all
+properties supported by IPE are listed below:
+
+op
+~~
+
+Indicates the operation for a rule to apply to. Must be in every rule,
+as the first token. IPE supports the following operations:
+
+   ``EXECUTE``
+
+      Pertains to any file attempting to be executed, or loaded as an
+      executable.
+
+   ``FIRMWARE``:
+
+      Pertains to firmware being loaded via the firmware_class interface.
+      This covers both the preallocated buffer and the firmware file
+      itself.
+
+   ``KMODULE``:
+
+      Pertains to loading kernel modules via ``modprobe`` or ``insmod``.
+
+   ``KEXEC_IMAGE``:
+
+      Pertains to kernel images loading via ``kexec``.
+
+   ``KEXEC_INITRAMFS``
+
+      Pertains to initrd images loading via ``kexec --initrd``.
+
+   ``POLICY``:
+
+      Controls loading policies via reading a kernel-space initiated read.
+
+      An example of such is loading IMA policies by writing the path
+      to the policy file to ``$securityfs/ima/policy``
+
+   ``X509_CERT``:
+
+      Controls loading IMA certificates through the Kconfigs,
+      ``CONFIG_IMA_X509_PATH`` and ``CONFIG_EVM_X509_PATH``.
+
+action
+~~~~~~
+
+   Determines what IPE should do when a rule matches. Must be in every
+   rule, as the final clause. Can be one of:
+
+   ``ALLOW``:
+
+      If the rule matches, explicitly allow access to the resource to proceed
+      without executing any more rules.
+
+   ``DENY``:
+
+      If the rule matches, explicitly prohibit access to the resource to
+      proceed without executing any more rules.
+
+boot_verified
+~~~~~~~~~~~~~
+
+   This property can be utilized for authorization of files from initramfs.
+   The format of this property is::
+
+         boot_verified=(TRUE|FALSE)
+
+
+   .. WARNING::
+
+      This property will trust files from initramfs(rootfs). It should
+      only be used during early booting stage. Before mounting the real
+      rootfs on top of the initramfs, initramfs script will recursively
+      remove all files and directories on the initramfs. This is typically
+      implemented by using switch_root(8) [#switch_root]_. Therefore the
+      initramfs will be empty and not accessible after the real
+      rootfs takes over. It is advised to switch to a different policy
+      that doesn't rely on the property after this point.
+      This ensures that the trust policies remain relevant and effective
+      throughout the system's operation.
+
+dmverity_roothash
+~~~~~~~~~~~~~~~~~
+
+   This property can be utilized for authorization or revocation of
+   specific dm-verity volumes, identified via their root hashes. It has a
+   dependency on the DM_VERITY module. This property is controlled by
+   the ``IPE_PROP_DM_VERITY`` config option, it will be automatically
+   selected when ``SECURITY_IPE`` and ``DM_VERITY`` are all enabled.
+   The format of this property is::
+
+      dmverity_roothash=DigestName:HexadecimalString
+
+   The supported DigestNames for dmverity_roothash are [#dmveritydigests]_
+
+      + blake2b-512
+      + blake2s-256
+      + sha256
+      + sha384
+      + sha512
+      + sha3-224
+      + sha3-256
+      + sha3-384
+      + sha3-512
+      + sm3
+      + rmd160
+
+dmverity_signature
+~~~~~~~~~~~~~~~~~~
+
+   This property can be utilized for authorization of all dm-verity
+   volumes that have a signed roothash that validated by a keyring
+   specified by dm-verity's configuration, either the system trusted
+   keyring, or the secondary keyring. It depends on
+   ``DM_VERITY_VERIFY_ROOTHASH_SIG`` config option and is controlled by
+   the ``IPE_PROP_DM_VERITY_SIGNATURE`` config option, it will be automatically
+   selected when ``SECURITY_IPE``, ``DM_VERITY`` and
+   ``DM_VERITY_VERIFY_ROOTHASH_SIG`` are all enabled.
+   The format of this property is::
+
+      dmverity_signature=(TRUE|FALSE)
+
+fsverity_digest
+~~~~~~~~~~~~~~~
+
+   This property can be utilized for authorization of specific fsverity
+   enabled files, identified via their fsverity digests.
+   It depends on ``FS_VERITY`` config option and is controlled by
+   the ``IPE_PROP_FS_VERITY`` config option, it will be automatically
+   selected when ``SECURITY_IPE`` and ``FS_VERITY`` are all enabled.
+   The format of this property is::
+
+      fsverity_digest=DigestName:HexadecimalString
+
+   The supported DigestNames for fsverity_digest are [#fsveritydigest]_
+
+      + sha256
+      + sha512
+
+fsverity_signature
+~~~~~~~~~~~~~~~~~~
+
+   This property is used to authorize all fs-verity enabled files that have
+   been verified by fs-verity's built-in signature mechanism. The signature
+   verification relies on a key stored within the ".fs-verity" keyring. It
+   depends on ``FS_VERITY_BUILTIN_SIGNATURES`` config option and
+   it is controlled by the ``IPE_PROP_FS_VERITY`` config option,
+   it will be automatically selected when ``SECURITY_IPE``, ``FS_VERITY``
+   and ``FS_VERITY_BUILTIN_SIGNATURES`` are all enabled.
+   The format of this property is::
+
+      fsverity_signature=(TRUE|FALSE)
+
+Policy Examples
+---------------
+
+Allow all
+~~~~~~~~~
+
+::
+
+   policy_name=Allow_All policy_version=0.0.0
+   DEFAULT action=ALLOW
+
+Allow only initramfs
+~~~~~~~~~~~~~~~~~~~~
+
+::
+
+   policy_name=Allow_Initramfs policy_version=0.0.0
+   DEFAULT action=DENY
+
+   op=EXECUTE boot_verified=TRUE action=ALLOW
+
+Allow any signed and validated dm-verity volume and the initramfs
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+   policy_name=Allow_Signed_DMV_And_Initramfs policy_version=0.0.0
+   DEFAULT action=DENY
+
+   op=EXECUTE boot_verified=TRUE action=ALLOW
+   op=EXECUTE dmverity_signature=TRUE action=ALLOW
+
+Prohibit execution from a specific dm-verity volume
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+   policy_name=Deny_DMV_By_Roothash policy_version=0.0.0
+   DEFAULT action=DENY
+
+   op=EXECUTE dmverity_roothash=sha256:cd2c5bae7c6c579edaae4353049d58eb5f2e8be0244bf05345bc8e5ed257baff action=DENY
+
+   op=EXECUTE boot_verified=TRUE action=ALLOW
+   op=EXECUTE dmverity_signature=TRUE action=ALLOW
+
+Allow only a specific dm-verity volume
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+   policy_name=Allow_DMV_By_Roothash policy_version=0.0.0
+   DEFAULT action=DENY
+
+   op=EXECUTE dmverity_roothash=sha256:401fcec5944823ae12f62726e8184407a5fa9599783f030dec146938 action=ALLOW
+
+Allow any fs-verity file with a valid built-in signature
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+   policy_name=Allow_Signed_And_Validated_FSVerity policy_version=0.0.0
+   DEFAULT action=DENY
+
+   op=EXECUTE fsverity_signature=TRUE action=ALLOW
+
+Allow execution of a specific fs-verity file
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+::
+
+   policy_name=ALLOW_FSV_By_Digest policy_version=0.0.0
+   DEFAULT action=DENY
+
+   op=EXECUTE fsverity_digest=sha256:fd88f2b8824e197f850bf4c5109bea5cf0ee38104f710843bb72da796ba5af9e action=ALLOW
+
+Additional Information
+----------------------
+
+- `Github Repository <https://github.com/microsoft/ipe>`_
+- :doc:`Developer and design docs for IPE </security/ipe>`
+
+FAQ
+---
+
+Q:
+   What's the difference between other LSMs which provide a measure of
+   trust-based access control?
+
+A:
+
+   In general, there's two other LSMs that can provide similar functionality:
+   IMA, and Loadpin.
+
+   IMA and IPE are functionally very similar. The significant difference between
+   the two is the policy. [#devdoc]_
+
+   Loadpin and IPE differ fairly dramatically, as Loadpin only covers the IPE's
+   kernel read operations, whereas IPE is capable of controlling execution
+   on top of kernel read. The trust model is also different; Loadpin roots its
+   trust in the initial super-block, whereas trust in IPE is stemmed from kernel
+   itself (via ``SYSTEM_TRUSTED_KEYS``).
+
+-----------
+
+.. [#digest_cache_lsm] https://lore.kernel.org/lkml/20240415142436.2545003-1-roberto.sassu@huaweicloud.com/
+
+.. [#interpreters] There is `some interest in solving this issue <https://lore.kernel.org/lkml/20220321161557.495388-1-mic@digikod.net/>`_.
+
+.. [#devdoc] Please see :doc:`the design docs </security/ipe>` for more on
+             this topic.
+
+.. [#switch_root] https://man7.org/linux/man-pages/man8/switch_root.8.html
+
+.. [#dmveritydigests] These hash algorithms are based on values accepted by
+                      the Linux crypto API; IPE does not impose any
+                      restrictions on the digest algorithm itself;
+                      thus, this list may be out of date.
+
+.. [#fsveritydigest] These hash algorithms are based on values accepted by the
+                     kernel's fsverity support; IPE does not impose any
+                     restrictions on the digest algorithm itself;
+                     thus, this list may be out of date.
diff --git a/Documentation/admin-guide/bug-bisect.rst b/Documentation/admin-guide/bug-bisect.rst
index 325c5d0ed34a..585630d14581 100644
--- a/Documentation/admin-guide/bug-bisect.rst
+++ b/Documentation/admin-guide/bug-bisect.rst
@@ -1,76 +1,144 @@
-Bisecting a bug
-+++++++++++++++
+.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
+.. [see the bottom of this file for redistribution information]
 
-Last updated: 28 October 2016
+======================
+Bisecting a regression
+======================
 
-Introduction
-============
+This document describes how to use a ``git bisect`` to find the source code
+change that broke something -- for example when some functionality stopped
+working after upgrading from Linux 6.0 to 6.1.
 
-Always try the latest kernel from kernel.org and build from source. If you are
-not confident in doing that please report the bug to your distribution vendor
-instead of to a kernel developer.
+The text focuses on the gist of the process. If you are new to bisecting the
+kernel, better follow Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst
+instead: it depicts everything from start to finish while covering multiple
+aspects even kernel developers occasionally forget. This includes detecting
+situations early where a bisection would be a waste of time, as nobody would
+care about the result -- for example, because the problem happens after the
+kernel marked itself as 'tainted', occurs in an abandoned version, was already
+fixed, or is caused by a .config change you or your Linux distributor performed.
 
-Finding bugs is not always easy. Have a go though. If you can't find it don't
-give up. Report as much as you have found to the relevant maintainer. See
-MAINTAINERS for who that is for the subsystem you have worked on.
+Finding the change causing a kernel issue using a bisection
+===========================================================
 
-Before you submit a bug report read
-'Documentation/admin-guide/reporting-issues.rst'.
+*Note: the following process assumes you prepared everything for a bisection.
+This includes having a Git clone with the appropriate sources, installing the
+software required to build and install kernels, as well as a .config file stored
+in a safe place (the following example assumes '~/prepared_kernel_.config') to
+use as pristine base at each bisection step; ideally, you have also worked out
+a fully reliable and straight-forward way to reproduce the regression, too.*
 
-Devices not appearing
-=====================
-
-Often this is caused by udev/systemd. Check that first before blaming it
-on the kernel.
-
-Finding patch that caused a bug
-===============================
-
-Using the provided tools with ``git`` makes finding bugs easy provided the bug
-is reproducible.
-
-Steps to do it:
-
-- build the Kernel from its git source
-- start bisect with [#f1]_::
-
-	$ git bisect start
-
-- mark the broken changeset with::
-
-	$ git bisect bad [commit]
-
-- mark a changeset where the code is known to work with::
-
-	$ git bisect good [commit]
-
-- rebuild the Kernel and test
-- interact with git bisect by using either::
-
-	$ git bisect good
-
-  or::
-
-	$ git bisect bad
-
-  depending if the bug happened on the changeset you're testing
-- After some interactions, git bisect will give you the changeset that
-  likely caused the bug.
-
-- For example, if you know that the current version is bad, and version
-  4.8 is good, you could do::
-
-           $ git bisect start
-           $ git bisect bad                 # Current version is bad
-           $ git bisect good v4.8
-
-
-.. [#f1] You can, optionally, provide both good and bad arguments at git
-	 start with ``git bisect start [BAD] [GOOD]``
-
-For further references, please read:
-
-- The man page for ``git-bisect``
-- `Fighting regressions with git bisect <https://www.kernel.org/pub/software/scm/git/docs/git-bisect-lk2009.html>`_
-- `Fully automated bisecting with "git bisect run" <https://lwn.net/Articles/317154>`_
-- `Using Git bisect to figure out when brokenness was introduced <http://webchick.net/node/99>`_
+* Preparation: start the bisection and tell Git about the points in the history
+  you consider to be working and broken, which Git calls 'good' and 'bad'::
+
+     git bisect start
+     git bisect good v6.0
+     git bisect bad v6.1
+
+  Instead of Git tags like 'v6.0' and 'v6.1' you can specify commit-ids, too.
+
+1. Copy your prepared .config into the build directory and adjust it to the
+   needs of the codebase Git checked out for testing::
+
+     cp ~/prepared_kernel_.config .config
+     make olddefconfig
+
+2. Now build, install, and boot a kernel. This might fail for unrelated reasons,
+   for example, when a compile error happens at the current stage of the
+   bisection a later change resolves. In such cases run ``git bisect skip`` and
+   go back to step 1.
+
+3. Check if the functionality that regressed works in the kernel you just built.
+
+   If it works, execute::
+
+     git bisect good
+
+   If it is broken, run::
+
+     git bisect bad
+
+   Note, getting this wrong just once will send the rest of the bisection
+   totally off course. To prevent having to start anew later you thus want to
+   ensure what you tell Git is correct; it is thus often wise to spend a few
+   minutes more on testing in case your reproducer is unreliable.
+
+   After issuing one of these two commands, Git will usually check out another
+   bisection point and print something like 'Bisecting: 675 revisions left to
+   test after this (roughly 10 steps)'. In that case go back to step 1.
+
+   If Git instead prints something like 'cafecaca0c0dacafecaca0c0dacafecaca0c0da
+   is the first bad commit', then you have finished the bisection. In that case
+   move to the next point below. Note, right after displaying that line Git will
+   show some details about the culprit including its patch description; this can
+   easily fill your terminal, so you might need to scroll up to see the message
+   mentioning the culprit's commit-id.
+
+   In case you missed Git's output, you can always run ``git bisect log`` to
+   print the status: it will show how many steps remain or mention the result of
+   the bisection.
+
+* Recommended complementary task: put the bisection log and the current .config
+  file aside for the bug report; furthermore tell Git to reset the sources to
+  the state before the bisection::
+
+     git bisect log > ~/bisection-log
+     cp .config ~/bisection-config-culprit
+     git bisect reset
+
+* Recommended optional task: try reverting the culprit on top of the latest
+  codebase and check if that fixes your bug; if that is the case, it validates
+  the bisection and enables developers to resolve the regression through a
+  revert.
+
+  To try this, update your clone and check out latest mainline. Then tell Git
+  to revert the change by specifying its commit-id::
+
+     git revert --no-edit cafec0cacaca0
+
+  Git might reject this, for example when the bisection landed on a merge
+  commit. In that case, abandon the attempt. Do the same, if Git fails to revert
+  the culprit on its own because later changes depend on it -- at least unless
+  you bisected a stable or longterm kernel series, in which case you want to
+  check out its latest codebase and try a revert there.
+
+  If a revert succeeds, build and test another kernel to check if reverting
+  resolved your regression.
+
+With that the process is complete. Now report the regression as described by
+Documentation/admin-guide/reporting-issues.rst.
+
+
+Additional reading material
+---------------------------
+
+* The `man page for 'git bisect' <https://git-scm.com/docs/git-bisect>`_ and
+  `fighting regressions with 'git bisect' <https://git-scm.com/docs/git-bisect-lk2009.html>`_
+  in the Git documentation.
+* `Working with git bisect <https://nathanchance.dev/posts/working-with-git-bisect/>`_
+  from kernel developer Nathan Chancellor.
+* `Using Git bisect to figure out when brokenness was introduced <http://webchick.net/node/99>`_.
+* `Fully automated bisecting with 'git bisect run' <https://lwn.net/Articles/317154>`_.
+
+..
+   end-of-content
+..
+   This document is maintained by Thorsten Leemhuis <linux@leemhuis.info>. If
+   you spot a typo or small mistake, feel free to let him know directly and
+   he'll fix it. You are free to do the same in a mostly informal way if you
+   want to contribute changes to the text -- but for copyright reasons please CC
+   linux-doc@vger.kernel.org and 'sign-off' your contribution as
+   Documentation/process/submitting-patches.rst explains in the section 'Sign
+   your work - the Developer's Certificate of Origin'.
+..
+   This text is available under GPL-2.0+ or CC-BY-4.0, as stated at the top
+   of the file. If you want to distribute this text under CC-BY-4.0 only,
+   please use 'The Linux kernel development community' for author attribution
+   and link this as source:
+   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/bug-bisect.rst
+
+..
+   Note: Only the content of this RST file as found in the Linux kernel sources
+   is available under CC-BY-4.0, as versions of this text that were processed
+   (for example by the kernel's build system) might contain content taken from
+   files which use a more restrictive license.
diff --git a/Documentation/admin-guide/bug-hunting.rst b/Documentation/admin-guide/bug-hunting.rst
index 95299b08c405..1d0f8ceb3075 100644
--- a/Documentation/admin-guide/bug-hunting.rst
+++ b/Documentation/admin-guide/bug-hunting.rst
@@ -244,14 +244,14 @@ Reporting the bug
 Once you find where the bug happened, by inspecting its location,
 you could either try to fix it yourself or report it upstream.
 
-In order to report it upstream, you should identify the mailing list
-used for the development of the affected code. This can be done by using
-the ``get_maintainer.pl`` script.
+In order to report it upstream, you should identify the bug tracker, if any, or
+mailing list used for the development of the affected code. This can be done by
+using the ``get_maintainer.pl`` script.
 
 For example, if you find a bug at the gspca's sonixj.c file, you can get
 its maintainers with::
 
-	$ ./scripts/get_maintainer.pl -f drivers/media/usb/gspca/sonixj.c
+	$ ./scripts/get_maintainer.pl --bug -f drivers/media/usb/gspca/sonixj.c
 	Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%)
 	Mauro Carvalho Chehab <mchehab@kernel.org> (maintainer:MEDIA INPUT INFRASTRUCTURE (V4L/DVB),commit_signer:1/1=100%)
 	Tejun Heo <tj@kernel.org> (commit_signer:1/1=100%)
@@ -267,11 +267,12 @@ Please notice that it will point to:
 - The driver maintainer (Hans Verkuil);
 - The subsystem maintainer (Mauro Carvalho Chehab);
 - The driver and/or subsystem mailing list (linux-media@vger.kernel.org);
-- the Linux Kernel mailing list (linux-kernel@vger.kernel.org).
+- The Linux Kernel mailing list (linux-kernel@vger.kernel.org);
+- The bug reporting URIs for the driver/subsystem (none in the above example).
 
-Usually, the fastest way to have your bug fixed is to report it to mailing
-list used for the development of the code (linux-media ML) copying the
-driver maintainer (Hans).
+If the listing contains bug reporting URIs at the end, please prefer them over
+email. Otherwise, please report bugs to the mailing list used for the
+development of the code (linux-media ML) copying the driver maintainer (Hans).
 
 If you are totally stumped as to whom to send the report, and
 ``get_maintainer.pl`` didn't provide you anything useful, send it to
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 86311c2907cd..c65756afd372 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -533,10 +533,12 @@ cgroup namespace on namespace creation.
 Because the resource control interface files in a given directory
 control the distribution of the parent's resources, the delegatee
 shouldn't be allowed to write to them.  For the first method, this is
-achieved by not granting access to these files.  For the second, the
-kernel rejects writes to all files other than "cgroup.procs" and
-"cgroup.subtree_control" on a namespace root from inside the
-namespace.
+achieved by not granting access to these files.  For the second, files
+outside the namespace should be hidden from the delegatee by the means
+of at least mount namespacing, and the kernel rejects writes to all
+files on a namespace root from inside the cgroup namespace, except for
+those files listed in "/sys/kernel/cgroup/delegate" (including
+"cgroup.procs", "cgroup.threads", "cgroup.subtree_control", etc.).
 
 The end results are equivalent for both delegation types.  Once
 delegated, the user can build sub-hierarchy under the directory,
@@ -981,6 +983,14 @@ All cgroup core files are prefixed with "cgroup."
 		A dying cgroup can consume system resources not exceeding
 		limits, which were active at the moment of cgroup deletion.
 
+	  nr_subsys_<cgroup_subsys>
+		Total number of live cgroup subsystems (e.g memory
+		cgroup) at and beneath the current cgroup.
+
+	  nr_dying_subsys_<cgroup_subsys>
+		Total number of dying cgroup subsystems (e.g. memory
+		cgroup) at and beneath the current cgroup.
+
   cgroup.freeze
 	A read-write single value file which exists on non-root cgroups.
 	Allowed values are "0" and "1". The default is "0".
@@ -1717,9 +1727,10 @@ The following nested keys are defined.
 	entries fault back in or are written out to disk.
 
   memory.zswap.writeback
-	A read-write single value file. The default value is "1". The
-	initial value of the root cgroup is 1, and when a new cgroup is
-	created, it inherits the current value of its parent.
+	A read-write single value file. The default value is "1".
+	Note that this setting is hierarchical, i.e. the writeback would be
+	implicitly disabled for child cgroups if the upper hierarchy
+	does so.
 
 	When this is set to 0, all swapping attempts to swapping devices
 	are disabled. This included both zswap writebacks, and swapping due
@@ -2939,8 +2950,8 @@ Deprecated v1 Core Features
 
 - "cgroup.clone_children" is removed.
 
-- /proc/cgroups is meaningless for v2.  Use "cgroup.controllers" file
-  at the root instead.
+- /proc/cgroups is meaningless for v2.  Use "cgroup.controllers" or
+  "cgroup.stat" files at the root instead.
 
 
 Issues with v1 and Rationales for v2
diff --git a/Documentation/admin-guide/device-mapper/dm-crypt.rst b/Documentation/admin-guide/device-mapper/dm-crypt.rst
index 552c9155165d..48a48bd09372 100644
--- a/Documentation/admin-guide/device-mapper/dm-crypt.rst
+++ b/Documentation/admin-guide/device-mapper/dm-crypt.rst
@@ -162,13 +162,18 @@ iv_large_sectors
 
 
 Module parameters::
-
    max_read_size
+      Maximum size of read requests. When a request larger than this size
+      is received, dm-crypt will split the request. The splitting improves
+      concurrency (the split requests could be encrypted in parallel by multiple
+      cores), but it also causes overhead. The user should tune this parameters to
+      fit the actual workload.
+
    max_write_size
-      Maximum size of read or write requests. When a request larger than this size
+      Maximum size of write requests. When a request larger than this size
       is received, dm-crypt will split the request. The splitting improves
       concurrency (the split requests could be encrypted in parallel by multiple
-      cores), but it also causes overhead. The user should tune these parameters to
+      cores), but it also causes overhead. The user should tune this parameters to
       fit the actual workload.
 
 
diff --git a/Documentation/admin-guide/hw-vuln/srso.rst b/Documentation/admin-guide/hw-vuln/srso.rst
index 4bd3ce3ba171..2ad1c05b8c88 100644
--- a/Documentation/admin-guide/hw-vuln/srso.rst
+++ b/Documentation/admin-guide/hw-vuln/srso.rst
@@ -158,3 +158,72 @@ poisoned BTB entry and using that safe one for all function returns.
 In older Zen1 and Zen2, this is accomplished using a reinterpretation
 technique similar to Retbleed one: srso_untrain_ret() and
 srso_safe_ret().
+
+Checking the safe RET mitigation actually works
+-----------------------------------------------
+
+In case one wants to validate whether the SRSO safe RET mitigation works
+on a kernel, one could use two performance counters
+
+* PMC_0xc8 - Count of RET/RET lw retired
+* PMC_0xc9 - Count of RET/RET lw retired mispredicted
+
+and compare the number of RETs retired properly vs those retired
+mispredicted, in kernel mode. Another way of specifying those events
+is::
+
+        # perf list ex_ret_near_ret
+
+        List of pre-defined events (to be used in -e or -M):
+
+        core:
+          ex_ret_near_ret
+               [Retired Near Returns]
+          ex_ret_near_ret_mispred
+               [Retired Near Returns Mispredicted]
+
+Either the command using the event mnemonics::
+
+        # perf stat -e ex_ret_near_ret:k -e ex_ret_near_ret_mispred:k sleep 10s
+
+or using the raw PMC numbers::
+
+        # perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s
+
+should give the same amount. I.e., every RET retired should be
+mispredicted::
+
+        [root@brent: ~/kernel/linux/tools/perf> ./perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s
+
+         Performance counter stats for 'sleep 10s':
+
+                   137,167      cpu/event=0xc8,umask=0/k
+                   137,173      cpu/event=0xc9,umask=0/k
+
+              10.004110303 seconds time elapsed
+
+               0.000000000 seconds user
+               0.004462000 seconds sys
+
+vs the case when the mitigation is disabled (spec_rstack_overflow=off)
+or not functioning properly, showing usually a lot smaller number of
+mispredicted retired RETs vs the overall count of retired RETs during
+a workload::
+
+       [root@brent: ~/kernel/linux/tools/perf> ./perf stat -e cpu/event=0xc8,umask=0/k -e cpu/event=0xc9,umask=0/k sleep 10s
+
+        Performance counter stats for 'sleep 10s':
+
+                  201,627      cpu/event=0xc8,umask=0/k
+                    4,074      cpu/event=0xc9,umask=0/k
+
+             10.003267252 seconds time elapsed
+
+              0.002729000 seconds user
+              0.000000000 seconds sys
+
+Also, there is a selftest which performs the above, go to
+tools/testing/selftests/x86/ and do::
+
+        make srso
+        ./srso
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 09126bb8cc9f..8337d0fed311 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -333,12 +333,17 @@
 					  allowed anymore to lift isolation
 					  requirements as needed. This option
 					  does not override iommu=pt
-			force_enable - Force enable the IOMMU on platforms known
-				       to be buggy with IOMMU enabled. Use this
-				       option with care.
-			pgtbl_v1     - Use v1 page table for DMA-API (Default).
-			pgtbl_v2     - Use v2 page table for DMA-API.
-			irtcachedis  - Disable Interrupt Remapping Table (IRT) caching.
+			force_enable    - Force enable the IOMMU on platforms known
+				          to be buggy with IOMMU enabled. Use this
+				          option with care.
+			pgtbl_v1        - Use v1 page table for DMA-API (Default).
+			pgtbl_v2        - Use v2 page table for DMA-API.
+			irtcachedis     - Disable Interrupt Remapping Table (IRT) caching.
+			nohugepages     - Limit page-sizes used for v1 page-tables
+				          to 4 KiB.
+			v2_pgsizes_only - Limit page-sizes used for v1 page-tables
+				          to 4KiB/2Mib/1GiB.
+
 
 	amd_iommu_dump=	[HW,X86-64]
 			Enable AMD IOMMU driver option to dump the ACPI table
@@ -517,6 +522,18 @@
 			Format: <io>,<irq>,<mode>
 			See header of drivers/net/hamradio/baycom_ser_hdx.c.
 
+	bdev_allow_write_mounted=
+			Format: <bool>
+			Control the ability to open a mounted block device
+			for writing, i.e., allow / disallow writes that bypass
+			the FS. This was implemented as a means to prevent
+			fuzzers from crashing the kernel by overwriting the
+			metadata underneath a mounted FS without its awareness.
+			This also prevents destructive formatting of mounted
+			filesystems by naive storage tooling that don't use
+			O_EXCL. Default is Y and can be changed through the
+			Kconfig option CONFIG_BLK_DEV_WRITE_MOUNTED.
+
 	bert_disable	[ACPI]
 			Disable BERT OS support on buggy BIOSes.
 
@@ -2350,6 +2367,18 @@
 	ipcmni_extend	[KNL,EARLY] Extend the maximum number of unique System V
 			IPC identifiers from 32,768 to 16,777,216.
 
+	ipe.enforce=	[IPE]
+			Format: <bool>
+			Determine whether IPE starts in permissive (0) or
+			enforce (1) mode. The default is enforce.
+
+	ipe.success_audit=
+			[IPE]
+			Format: <bool>
+			Start IPE with success auditing enabled, emitting
+			an audit event when a binary is allowed. The default
+			is 0.
+
 	irqaffinity=	[SMP] Set the default irq affinity mask
 			The argument is a cpu list, as described above.
 
@@ -4788,6 +4817,16 @@
 	printk.time=	Show timing data prefixed to each printk message line
 			Format: <bool>  (1/Y/y=enable, 0/N/n=disable)
 
+	proc_mem.force_override= [KNL]
+			Format: {always | ptrace | never}
+			Traditionally /proc/pid/mem allows memory permissions to be
+			overridden without restrictions. This option may be set to
+			restrict that. Can be one of:
+			- 'always': traditional behavior always allows mem overrides.
+			- 'ptrace': only allow mem overrides for active ptracers.
+			- 'never':  never allow mem overrides.
+			If not specified, default is the CONFIG_PROC_MEM_* choice.
+
 	processor.max_cstate=	[HW,ACPI]
 			Limit processor to maximum C-state
 			max_cstate=9 overrides any DMI blacklist limit.
@@ -4935,6 +4974,10 @@
 			Set maximum number of finished RCU callbacks to
 			process in one batch.
 
+	rcutree.csd_lock_suppress_rcu_stall=	[KNL]
+			Do only a one-line RCU CPU stall warning when
+			there is an ongoing too-long CSD-lock wait.
+
 	rcutree.do_rcu_barrier=	[KNL]
 			Request a call to rcu_barrier().  This is
 			throttled so that userspace tests can safely
@@ -5382,7 +5425,13 @@
 			Time to wait (s) after boot before inducing stall.
 
 	rcutorture.stall_cpu_irqsoff= [KNL]
-			Disable interrupts while stalling if set.
+			Disable interrupts while stalling if set, but only
+			on the first stall in the set.
+
+	rcutorture.stall_cpu_repeat= [KNL]
+			Number of times to repeat the stall sequence,
+			so that rcutorture.stall_cpu_repeat=3 will result
+			in four stall sequences.
 
 	rcutorture.stall_gp_kthread= [KNL]
 			Duration (s) of forced sleep within RCU
@@ -5570,14 +5619,6 @@
 			of zero will disable batching.	Batching is
 			always disabled for synchronize_rcu_tasks().
 
-	rcupdate.rcu_tasks_rude_lazy_ms= [KNL]
-			Set timeout in milliseconds RCU Tasks
-			Rude asynchronous callback batching for
-			call_rcu_tasks_rude().	A negative value
-			will take the default.	A value of zero will
-			disable batching.  Batching is always disabled
-			for synchronize_rcu_tasks_rude().
-
 	rcupdate.rcu_tasks_trace_lazy_ms= [KNL]
 			Set timeout in milliseconds RCU Tasks
 			Trace asynchronous callback batching for
@@ -7352,6 +7393,13 @@
 			it can be updated at runtime by writing to the
 			corresponding sysfs file.
 
+	workqueue.panic_on_stall=<uint>
+			Panic when workqueue stall is detected by
+			CONFIG_WQ_WATCHDOG. It sets the number times of the
+			stall to trigger panic.
+
+			The default is 0, which disables the panic on stall.
+
 	workqueue.cpu_intensive_thresh_us=
 			Per-cpu work items which run for longer than this
 			threshold are automatically considered CPU intensive
diff --git a/Documentation/admin-guide/media/vivid.rst b/Documentation/admin-guide/media/vivid.rst
index 1306f19ecb5a..c9d301ab46a3 100644
--- a/Documentation/admin-guide/media/vivid.rst
+++ b/Documentation/admin-guide/media/vivid.rst
@@ -328,7 +328,7 @@ and an HDMI input, one input for each input type. Those are described in more
 detail below.
 
 Special attention has been given to the rate at which new frames become
-available. The jitter will be around 1 jiffie (that depends on the HZ
+available. The jitter will be around 1 jiffy (that depends on the HZ
 configuration of your kernel, so usually 1/100, 1/250 or 1/1000 of a second),
 but the long-term behavior is exactly following the framerate. So a
 framerate of 59.94 Hz is really different from 60 Hz. If the framerate
diff --git a/Documentation/admin-guide/perf/arm-ni.rst b/Documentation/admin-guide/perf/arm-ni.rst
new file mode 100644
index 000000000000..d26a8f697c36
--- /dev/null
+++ b/Documentation/admin-guide/perf/arm-ni.rst
@@ -0,0 +1,17 @@
+====================================
+Arm Network-on Chip Interconnect PMU
+====================================
+
+NI-700 and friends implement a distinct PMU for each clock domain within the
+interconnect. Correspondingly, the driver exposes multiple PMU devices named
+arm_ni_<x>_cd_<y>, where <x> is an (arbitrary) instance identifier and <y> is
+the clock domain ID within that particular instance. If multiple NI instances
+exist within a system, the PMU devices can be correlated with the underlying
+hardware instance via sysfs parentage.
+
+Each PMU exposes base event aliases for the interface types present in its clock
+domain. These require qualifying with the "eventid" and "nodeid" parameters
+to specify the event code to count and the interface at which to count it
+(per the configured hardware ID as reflected in the xxNI_NODE_INFO register).
+The exception is the "cycles" alias for the PMU cycle counter, which is encoded
+with the PMU node type and needs no further qualification.
diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
index d47cd229d710..39b8e1fdd0cd 100644
--- a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
+++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
@@ -46,16 +46,16 @@ Some of the events only exist for specific configurations.
 DesignWare Cores (DWC) PCIe PMU Driver
 =======================================
 
-This driver adds PMU devices for each PCIe Root Port named based on the BDF of
+This driver adds PMU devices for each PCIe Root Port named based on the SBDF of
 the Root Port. For example,
 
-    30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
+    0001:30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
 
-the PMU device name for this Root Port is dwc_rootport_3018.
+the PMU device name for this Root Port is dwc_rootport_13018.
 
 The DWC PCIe PMU driver registers a perf PMU driver, which provides
 description of available events and configuration options in sysfs, see
-/sys/bus/event_source/devices/dwc_rootport_{bdf}.
+/sys/bus/event_source/devices/dwc_rootport_{sbdf}.
 
 The "format" directory describes format of the config fields of the
 perf_event_attr structure. The "events" directory provides configuration
@@ -66,16 +66,16 @@ The "perf list" command shall list the available events from sysfs, e.g.::
 
     $# perf list | grep dwc_rootport
     <...>
-    dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/        [Kernel PMU event]
+    dwc_rootport_13018/Rx_PCIe_TLP_Data_Payload/        [Kernel PMU event]
     <...>
-    dwc_rootport_3018/rx_memory_read,lane=?/               [Kernel PMU event]
+    dwc_rootport_13018/rx_memory_read,lane=?/               [Kernel PMU event]
 
 Time Based Analysis Event Usage
 -------------------------------
 
 Example usage of counting PCIe RX TLP data payload (Units of bytes)::
 
-    $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
+    $# perf stat -a -e dwc_rootport_13018/Rx_PCIe_TLP_Data_Payload/
 
 The average RX/TX bandwidth can be calculated using the following formula:
 
@@ -88,7 +88,7 @@ Lane Event Usage
 Each lane has the same event set and to avoid generating a list of hundreds
 of events, the user need to specify the lane ID explicitly, e.g.::
 
-    $# perf stat -a -e dwc_rootport_3018/rx_memory_read,lane=4/
+    $# perf stat -a -e dwc_rootport_13018/rx_memory_read,lane=4/
 
 The driver does not support sampling, therefore "perf record" will not
 work. Per-task (without "-a") perf sessions are not supported.
diff --git a/Documentation/admin-guide/perf/hisi-pcie-pmu.rst b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst
index 5541ff40e06a..083ca50de896 100644
--- a/Documentation/admin-guide/perf/hisi-pcie-pmu.rst
+++ b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst
@@ -28,7 +28,9 @@ The "identifier" sysfs file allows users to identify the version of the
 PMU hardware device.
 
 The "bus" sysfs file allows users to get the bus number of Root Ports
-monitored by PMU.
+monitored by PMU. Furthermore users can get the Root Ports range in
+[bdf_min, bdf_max] from "bdf_min" and "bdf_max" sysfs attributes
+respectively.
 
 Example usage of perf::
 
diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
index 7eb3dcd6f4da..8502bc174640 100644
--- a/Documentation/admin-guide/perf/index.rst
+++ b/Documentation/admin-guide/perf/index.rst
@@ -16,6 +16,7 @@ Performance monitor support
    starfive_starlink_pmu
    arm-ccn
    arm-cmn
+   arm-ni
    xgene-pmu
    arm_dsu_pmu
    thunderx2-pmu
diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst
index d0324d44f548..210a808b74ec 100644
--- a/Documentation/admin-guide/pm/amd-pstate.rst
+++ b/Documentation/admin-guide/pm/amd-pstate.rst
@@ -251,7 +251,9 @@ performance supported in `AMD CPPC Performance Capability <perf_cap_>`_).
 In some ASICs, the highest CPPC performance is not the one in the ``_CPC``
 table, so we need to expose it to sysfs. If boost is not active, but
 still supported, this maximum frequency will be larger than the one in
-``cpuinfo``.
+``cpuinfo``. On systems that support preferred core, the driver will have
+different values for some cores than others and this will reflect the values
+advertised by the platform at bootup.
 This attribute is read-only.
 
 ``amd_pstate_lowest_nonlinear_freq``
@@ -262,6 +264,17 @@ lowest non-linear performance in `AMD CPPC Performance Capability
 <perf_cap_>`_.)
 This attribute is read-only.
 
+``amd_pstate_hw_prefcore``
+
+Whether the platform supports the preferred core feature and it has been
+enabled. This attribute is read-only.
+
+``amd_pstate_prefcore_ranking``
+
+The performance ranking of the core. This number doesn't have any unit, but
+larger numbers are preferred at the time of reading. This can change at
+runtime based on platform conditions. This attribute is read-only.
+
 ``energy_performance_available_preferences``
 
 A list of all the supported EPP preferences that could be used for
diff --git a/Documentation/admin-guide/ramoops.rst b/Documentation/admin-guide/ramoops.rst
index 6f534a707b2a..2eabef31220d 100644
--- a/Documentation/admin-guide/ramoops.rst
+++ b/Documentation/admin-guide/ramoops.rst
@@ -129,7 +129,7 @@ Setting the ramoops parameters can be done in several different manners:
     takes a size, alignment and name as arguments. The name is used
     to map the memory to a label that can be retrieved by ramoops.
 
-	reserver_mem=2M:4096:oops  ramoops.mem_name=oops
+	reserve_mem=2M:4096:oops  ramoops.mem_name=oops
 
 You can specify either RAM memory or peripheral devices' memory. However, when
 specifying RAM, be sure to reserve the memory by issuing memblock_reserve()
diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index f92551539e8a..700aa72eecb1 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -182,3 +182,5 @@ More detailed explanation for tainting
      produce extremely unusual kernel structure layouts (even performance
      pathological ones), which is important to know when debugging. Set at
      build time.
+
+ 18) ``N`` if an in-kernel test, such as a KUnit test, has been run.