<feed xmlns='http://www.w3.org/2005/Atom'>
<title>blaster4385/linux-IllusionX/include, branch v6.12.10</title>
<subtitle>Linux kernel with personal config changes for arch linux</subtitle>
<id>https://git.tablaster.dev/blaster4385/linux-IllusionX/atom?h=v6.12.10</id>
<link rel='self' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/atom?h=v6.12.10'/>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/'/>
<updated>2025-01-21T05:33:13Z</updated>
<entry>
<title>ZEN: Add sysctl and CONFIG to disallow unprivileged CLONE_NEWUSER</title>
<updated>2025-01-21T05:33:13Z</updated>
<author>
<name>Jan Alexander Steffens (heftig)</name>
<email>jan.steffens@gmail.com</email>
</author>
<published>2019-09-16T02:53:20Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=f3add19ac9ec07587b4f519f33b7a6380e9a7842'/>
<id>urn:sha1:f3add19ac9ec07587b4f519f33b7a6380e9a7842</id>
<content type='text'>
Our default behavior continues to match the vanilla kernel.
</content>
</entry>
<entry>
<title>firewall: remove misplaced semicolon from stm32_firewall_get_firewall</title>
<updated>2025-01-17T12:40:59Z</updated>
<author>
<name>guanjing</name>
<email>guanjing@cmss.chinamobile.com</email>
</author>
<published>2024-12-20T08:33:35Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=810aad1d7fdd0afe1808b27e4f4af9cd0bcf08a7'/>
<id>urn:sha1:810aad1d7fdd0afe1808b27e4f4af9cd0bcf08a7</id>
<content type='text'>
[ Upstream commit 155c5bf26f983e9988333eeb0ef217138304d13b ]

Remove misplaced colon in stm32_firewall_get_firewall()
which results in a syntax error when the code is compiled
without CONFIG_STM32_FIREWALL.

Fixes: 5c9668cfc6d7 ("firewall: introduce stm32_firewall framework")
Signed-off-by: guanjing &lt;guanjing@cmss.chinamobile.com&gt;
Reviewed-by: Gatien Chevallier &lt;gatien.chevallier@foss.st.com&gt;
Signed-off-by: Alexandre Torgue &lt;alexandre.torgue@foss.st.com&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs: kill MNT_ONRB</title>
<updated>2025-01-17T12:40:50Z</updated>
<author>
<name>Christian Brauner</name>
<email>brauner@kernel.org</email>
</author>
<published>2024-12-15T20:17:05Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=1ca9de8867a93ca2b863ba4bf855e174f165ea26'/>
<id>urn:sha1:1ca9de8867a93ca2b863ba4bf855e174f165ea26</id>
<content type='text'>
commit 344bac8f0d73fe970cd9f5b2f132906317d29e8b upstream.

Move mnt-&gt;mnt_node into the union with mnt-&gt;mnt_rcu and mnt-&gt;mnt_llist
instead of keeping it with mnt-&gt;mnt_list. This allows us to use
RB_CLEAR_NODE(&amp;mnt-&gt;mnt_node) in umount_tree() as well as
list_empty(&amp;mnt-&gt;mnt_node). That in turn allows us to remove MNT_ONRB.

This also fixes the bug reported in [1] where seemingly MNT_ONRB wasn't
set in @mnt-&gt;mnt_flags even though the mount was present in the mount
rbtree of the mount namespace.

The root cause is the following race. When a btrfs subvolume is mounted
a temporary mount is created:

btrfs_get_tree_subvol()
{
        mnt = fc_mount()
        // Register the newly allocated mount with sb-&gt;mounts:
        lock_mount_hash();
        list_add_tail(&amp;mnt-&gt;mnt_instance, &amp;mnt-&gt;mnt.mnt_sb-&gt;s_mounts);
        unlock_mount_hash();
}

and registered on sb-&gt;s_mounts. Later it is added to an anonymous mount
namespace via mount_subvol():

-&gt; mount_subvol()
   -&gt; mount_subtree()
      -&gt; alloc_mnt_ns()
         mnt_add_to_ns()
         vfs_path_lookup()
         put_mnt_ns()

The mnt_add_to_ns() call raises MNT_ONRB in @mnt-&gt;mnt_flags. If someone
concurrently does a ro remount:

reconfigure_super()
-&gt; sb_prepare_remount_readonly()
   {
           list_for_each_entry(mnt, &amp;sb-&gt;s_mounts, mnt_instance) {
   }

all mounts registered in sb-&gt;s_mounts are visited and first
MNT_WRITE_HOLD is raised, then MNT_READONLY is raised, and finally
MNT_WRITE_HOLD is removed again.

The flag modification for MNT_WRITE_HOLD/MNT_READONLY and MNT_ONRB race
so MNT_ONRB might be lost.

Fixes: 2eea9ce4310d ("mounts: keep list of mounts in an rbtree")
Cc: &lt;stable@kernel.org&gt; # v6.8+
Link: https://lore.kernel.org/r/20241215-vfs-6-14-mount-work-v1-1-fd55922c4af8@kernel.org
Link: https://lore.kernel.org/r/ec6784ed-8722-4695-980a-4400d4e7bd1a@gmx.com [1]
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>scsi: ufs: qcom: Power off the PHY if it was already powered on in ufs_qcom_power_up_sequence()</title>
<updated>2025-01-17T12:40:45Z</updated>
<author>
<name>Manivannan Sadhasivam</name>
<email>manivannan.sadhasivam@linaro.org</email>
</author>
<published>2024-12-19T16:50:41Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=b1a38fe4789cce90a4f20ee8517c5957e66afc12'/>
<id>urn:sha1:b1a38fe4789cce90a4f20ee8517c5957e66afc12</id>
<content type='text'>
commit 7bac65687510038390a0a54cbe14fba08d037e46 upstream.

PHY might already be powered on during ufs_qcom_power_up_sequence() in a
couple of cases:

 1. During UFSHCD_QUIRK_REINIT_AFTER_MAX_GEAR_SWITCH quirk

 2. Resuming from spm_lvl = 5 suspend

In those cases, it is necessary to call phy_power_off() and phy_exit() in
ufs_qcom_power_up_sequence() function to power off the PHY before calling
phy_init() and phy_power_on().

Case (1) is doing it via ufs_qcom_reinit_notify() callback, but case (2) is
not handled. So to satisfy both cases, call phy_power_off() and phy_exit()
if the phy_count is non-zero. And with this change, the reinit_notify()
callback is no longer needed.

This fixes the below UFS resume failure with spm_lvl = 5:

ufshcd-qcom 1d84000.ufshc: Enabling the controller failed
ufshcd-qcom 1d84000.ufshc: Enabling the controller failed
ufshcd-qcom 1d84000.ufshc: Enabling the controller failed
ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
ufshcd-qcom 1d84000.ufshc: Enabling the controller failed
ufshcd-qcom 1d84000.ufshc: Enabling the controller failed
ufshcd-qcom 1d84000.ufshc: Enabling the controller failed
ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
ufshcd-qcom 1d84000.ufshc: Enabling the controller failed
ufshcd-qcom 1d84000.ufshc: Enabling the controller failed
ufshcd-qcom 1d84000.ufshc: Enabling the controller failed
ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
ufshcd-qcom 1d84000.ufshc: Enabling the controller failed
ufshcd-qcom 1d84000.ufshc: Enabling the controller failed
ufshcd-qcom 1d84000.ufshc: Enabling the controller failed
ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
ufshcd-qcom 1d84000.ufshc: Enabling the controller failed
ufshcd-qcom 1d84000.ufshc: Enabling the controller failed
ufshcd-qcom 1d84000.ufshc: Enabling the controller failed
ufshcd-qcom 1d84000.ufshc: ufshcd_host_reset_and_restore: Host init failed -5
ufs_device_wlun 0:0:0:49488: ufshcd_wl_resume failed: -5
ufs_device_wlun 0:0:0:49488: PM: dpm_run_callback(): scsi_bus_resume returns -5
ufs_device_wlun 0:0:0:49488: PM: failed to resume async: error -5

Cc: stable@vger.kernel.org # 6.3
Fixes: baf5ddac90dc ("scsi: ufs: ufs-qcom: Add support for reinitializing the UFS device")
Reported-by: Ram Kumar Dwivedi &lt;quic_rdwivedi@quicinc.com&gt;
Tested-by: Amit Pundir &lt;amit.pundir@linaro.org&gt; # on SM8550-HDK
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Tested-by: Neil Armstrong &lt;neil.armstrong@linaro.org&gt; # on SM8550-QRD
Signed-off-by: Manivannan Sadhasivam &lt;manivannan.sadhasivam@linaro.org&gt;
Link: https://lore.kernel.org/r/20241219-ufs-qcom-suspend-fix-v3-1-63c4b95a70b9@linaro.org
Signed-off-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>tcp/dccp: allow a connection when sk_max_ack_backlog is zero</title>
<updated>2025-01-17T12:40:37Z</updated>
<author>
<name>Zhongqiu Duan</name>
<email>dzq.aishenghu0@gmail.com</email>
</author>
<published>2025-01-02T17:14:26Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=e4a92f0d51c6618c1644debaa1ba87b382cf173c'/>
<id>urn:sha1:e4a92f0d51c6618c1644debaa1ba87b382cf173c</id>
<content type='text'>
[ Upstream commit 3479c7549fb1dfa7a1db4efb7347c7b8ef50de4b ]

If the backlog of listen() is set to zero, sk_acceptq_is_full() allows
one connection to be made, but inet_csk_reqsk_queue_is_full() does not.
When the net.ipv4.tcp_syncookies is zero, inet_csk_reqsk_queue_is_full()
will cause an immediate drop before the sk_acceptq_is_full() check in
tcp_conn_request(), resulting in no connection can be made.

This patch tries to keep consistent with 64a146513f8f ("[NET]: Revert
incorrect accept queue backlog changes.").

Link: https://lore.kernel.org/netdev/20250102080258.53858-1-kuniyu@amazon.com/
Fixes: ef547f2ac16b ("tcp: remove max_qlen_log")
Signed-off-by: Zhongqiu Duan &lt;dzq.aishenghu0@gmail.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: Jason Xing &lt;kerneljasonxing@gmail.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Link: https://patch.msgid.link/20250102171426.915276-1-dzq.aishenghu0@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>netfs: Fix is-caching check in read-retry</title>
<updated>2025-01-17T12:40:35Z</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2024-12-16T20:34:45Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=6f153055ba050db2ae3dd932ec51f15adf7a648b'/>
<id>urn:sha1:6f153055ba050db2ae3dd932ec51f15adf7a648b</id>
<content type='text'>
[ Upstream commit d4e338de17cb6532bf805fae00db8b41e914009b ]

netfs: Fix is-caching check in read-retry

The read-retry code checks the NETFS_RREQ_COPY_TO_CACHE flag to determine
if there might be failed reads from the cache that need turning into reads
from the server, with the intention of skipping the complicated part if it
can.  The code that set the flag, however, got lost during the read-side
rewrite.

Fix the check to see if the cache_resources are valid instead.  The flag
can then be removed.

Fixes: ee4cdf7ba857 ("netfs: Speed up buffered reading")
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Link: https://lore.kernel.org/r/3752048.1734381285@warthog.procyon.org.uk
cc: Jeff Layton &lt;jlayton@kernel.org&gt;
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>iomap: fix zero padding data issue in concurrent append writes</title>
<updated>2025-01-17T12:40:34Z</updated>
<author>
<name>Long Li</name>
<email>leo.lilong@huawei.com</email>
</author>
<published>2024-12-09T11:42:40Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=82c59a86a247a8970d353d10f52a37e5564fb137'/>
<id>urn:sha1:82c59a86a247a8970d353d10f52a37e5564fb137</id>
<content type='text'>
[ Upstream commit 51d20d1dacbec589d459e11fc88fbca419f84a99 ]

During concurrent append writes to XFS filesystem, zero padding data
may appear in the file after power failure. This happens due to imprecise
disk size updates when handling write completion.

Consider this scenario with concurrent append writes same file:

  Thread 1:                  Thread 2:
  ------------               -----------
  write [A, A+B]
  update inode size to A+B
  submit I/O [A, A+BS]
                             write [A+B, A+B+C]
                             update inode size to A+B+C
  &lt;I/O completes, updates disk size to min(A+B+C, A+BS)&gt;
  &lt;power failure&gt;

After reboot:
  1) with A+B+C &lt; A+BS, the file has zero padding in range [A+B, A+B+C]

  |&lt;         Block Size (BS)      &gt;|
  |DDDDDDDDDDDDDDDD0000000000000000|
  ^               ^        ^
  A              A+B     A+B+C
                         (EOF)

  2) with A+B+C &gt; A+BS, the file has zero padding in range [A+B, A+BS]

  |&lt;         Block Size (BS)      &gt;|&lt;           Block Size (BS)    &gt;|
  |DDDDDDDDDDDDDDDD0000000000000000|00000000000000000000000000000000|
  ^               ^                ^               ^
  A              A+B              A+BS           A+B+C
                                  (EOF)

  D = Valid Data
  0 = Zero Padding

The issue stems from disk size being set to min(io_offset + io_size,
inode-&gt;i_size) at I/O completion. Since io_offset+io_size is block
size granularity, it may exceed the actual valid file data size. In
the case of concurrent append writes, inode-&gt;i_size may be larger
than the actual range of valid file data written to disk, leading to
inaccurate disk size updates.

This patch modifies the meaning of io_size to represent the size of
valid data within EOF in an ioend. If the ioend spans beyond i_size,
io_size will be trimmed to provide the file with more accurate size
information. This is particularly useful for on-disk size updates
at completion time.

After this change, ioends that span i_size will not grow or merge with
other ioends in concurrent scenarios. However, these cases that need
growth/merging rarely occur and it seems no noticeable performance impact.
Although rounding up io_size could enable ioend growth/merging in these
scenarios, we decided to keep the code simple after discussion [1].

Another benefit is that it makes the xfs_ioend_is_append() check more
accurate, which can reduce unnecessary end bio callbacks of xfs_end_bio()
in certain scenarios, such as repeated writes at the file tail without
extending the file size.

Link [1]: https://patchwork.kernel.org/project/xfs/patch/20241113091907.56937-1-leo.lilong@huawei.com

Fixes: ae259a9c8593 ("fs: introduce iomap infrastructure") # goes further back than this
Signed-off-by: Long Li &lt;leo.lilong@huawei.com&gt;
Link: https://lore.kernel.org/r/20241209114241.3725722-3-leo.lilong@huawei.com
Reviewed-by: Brian Foster &lt;bfoster@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>fs/writeback: convert wbc_account_cgroup_owner to take a folio</title>
<updated>2025-01-17T12:40:33Z</updated>
<author>
<name>Pankaj Raghav</name>
<email>p.raghav@samsung.com</email>
</author>
<published>2024-09-26T14:01:21Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=f40881bde8f64d5c237424ea571e6ddd92209ab2'/>
<id>urn:sha1:f40881bde8f64d5c237424ea571e6ddd92209ab2</id>
<content type='text'>
[ Upstream commit 30dac24e14b52e1787572d1d4e06eeabe8a63630 ]

Most of the callers of wbc_account_cgroup_owner() are converting a folio
to page before calling the function. wbc_account_cgroup_owner() is
converting the page back to a folio to call mem_cgroup_css_from_folio().

Convert wbc_account_cgroup_owner() to take a folio instead of a page,
and convert all callers to pass a folio directly except f2fs.

Convert the page to folio for all the callers from f2fs as they were the
only callers calling wbc_account_cgroup_owner() with a page. As f2fs is
already in the process of converting to folios, these call sites might
also soon be calling wbc_account_cgroup_owner() with a folio directly in
the future.

No functional changes. Only compile tested.

Signed-off-by: Pankaj Raghav &lt;p.raghav@samsung.com&gt;
Link: https://lore.kernel.org/r/20240926140121.203821-1-kernel@pankajraghav.com
Acked-by: David Sterba &lt;dsterba@suse.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
Stable-dep-of: 51d20d1dacbe ("iomap: fix zero padding data issue in concurrent append writes")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
</entry>
<entry>
<title>mm: hugetlb: independent PMD page table shared count</title>
<updated>2025-01-09T12:33:54Z</updated>
<author>
<name>Liu Shixin</name>
<email>liushixin2@huawei.com</email>
</author>
<published>2024-12-16T07:11:47Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=2e31443a0d18ae43b9d29e02bf0563f07772193d'/>
<id>urn:sha1:2e31443a0d18ae43b9d29e02bf0563f07772193d</id>
<content type='text'>
commit 59d9094df3d79443937add8700b2ef1a866b1081 upstream.

The folio refcount may be increased unexpectly through try_get_folio() by
caller such as split_huge_pages.  In huge_pmd_unshare(), we use refcount
to check whether a pmd page table is shared.  The check is incorrect if
the refcount is increased by the above caller, and this can cause the page
table leaked:

 BUG: Bad page state in process sh  pfn:109324
 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x66 pfn:0x109324
 flags: 0x17ffff800000000(node=0|zone=2|lastcpupid=0xfffff)
 page_type: f2(table)
 raw: 017ffff800000000 0000000000000000 0000000000000000 0000000000000000
 raw: 0000000000000066 0000000000000000 00000000f2000000 0000000000000000
 page dumped because: nonzero mapcount
 ...
 CPU: 31 UID: 0 PID: 7515 Comm: sh Kdump: loaded Tainted: G    B              6.13.0-rc2master+ #7
 Tainted: [B]=BAD_PAGE
 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
 Call trace:
  show_stack+0x20/0x38 (C)
  dump_stack_lvl+0x80/0xf8
  dump_stack+0x18/0x28
  bad_page+0x8c/0x130
  free_page_is_bad_report+0xa4/0xb0
  free_unref_page+0x3cc/0x620
  __folio_put+0xf4/0x158
  split_huge_pages_all+0x1e0/0x3e8
  split_huge_pages_write+0x25c/0x2d8
  full_proxy_write+0x64/0xd8
  vfs_write+0xcc/0x280
  ksys_write+0x70/0x110
  __arm64_sys_write+0x24/0x38
  invoke_syscall+0x50/0x120
  el0_svc_common.constprop.0+0xc8/0xf0
  do_el0_svc+0x24/0x38
  el0_svc+0x34/0x128
  el0t_64_sync_handler+0xc8/0xd0
  el0t_64_sync+0x190/0x198

The issue may be triggered by damon, offline_page, page_idle, etc, which
will increase the refcount of page table.

1. The page table itself will be discarded after reporting the
   "nonzero mapcount".

2. The HugeTLB page mapped by the page table miss freeing since we
   treat the page table as shared and a shared page table will not be
   unmapped.

Fix it by introducing independent PMD page table shared count.  As
described by comment, pt_index/pt_mm/pt_frag_refcount are used for s390
gmap, x86 pgds and powerpc, pt_share_count is used for x86/arm64/riscv
pmds, so we can reuse the field as pt_share_count.

Link: https://lkml.kernel.org/r/20241216071147.3984217-1-liushixin2@huawei.com
Fixes: 39dde65c9940 ("[PATCH] shared page table for hugetlb page")
Signed-off-by: Liu Shixin &lt;liushixin2@huawei.com&gt;
Cc: Kefeng Wang &lt;wangkefeng.wang@huawei.com&gt;
Cc: Ken Chen &lt;kenneth.w.chen@intel.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Nanyong Sun &lt;sunnanyong@huawei.com&gt;
Cc: Jane Chu &lt;jane.chu@oracle.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
<entry>
<title>mm: reinstate ability to map write-sealed memfd mappings read-only</title>
<updated>2025-01-09T12:33:54Z</updated>
<author>
<name>Lorenzo Stoakes</name>
<email>lorenzo.stoakes@oracle.com</email>
</author>
<published>2024-11-28T15:06:17Z</published>
<link rel='alternate' type='text/html' href='https://git.tablaster.dev/blaster4385/linux-IllusionX/commit/?id=464770df46095e6967334d77113972960f7ef1fa'/>
<id>urn:sha1:464770df46095e6967334d77113972960f7ef1fa</id>
<content type='text'>
commit 8ec396d05d1b737c87311fb7311f753b02c2a6b1 upstream.

Patch series "mm: reinstate ability to map write-sealed memfd mappings
read-only".

In commit 158978945f31 ("mm: perform the mapping_map_writable() check
after call_mmap()") (and preceding changes in the same series) it became
possible to mmap() F_SEAL_WRITE sealed memfd mappings read-only.

Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path
behaviour") unintentionally undid this logic by moving the
mapping_map_writable() check before the shmem_mmap() hook is invoked,
thereby regressing this change.

This series reworks how we both permit write-sealed mappings being mapped
read-only and disallow mprotect() from undoing the write-seal, fixing this
regression.

We also add a regression test to ensure that we do not accidentally
regress this in future.

Thanks to Julian Orth for reporting this regression.


This patch (of 2):

In commit 158978945f31 ("mm: perform the mapping_map_writable() check
after call_mmap()") (and preceding changes in the same series) it became
possible to mmap() F_SEAL_WRITE sealed memfd mappings read-only.

This was previously unnecessarily disallowed, despite the man page
documentation indicating that it would be, thereby limiting the usefulness
of F_SEAL_WRITE logic.

We fixed this by adapting logic that existed for the F_SEAL_FUTURE_WRITE
seal (one which disallows future writes to the memfd) to also be used for
F_SEAL_WRITE.

For background - the F_SEAL_FUTURE_WRITE seal clears VM_MAYWRITE for a
read-only mapping to disallow mprotect() from overriding the seal - an
operation performed by seal_check_write(), invoked from shmem_mmap(), the
f_op-&gt;mmap() hook used by shmem mappings.

By extending this to F_SEAL_WRITE and critically - checking
mapping_map_writable() to determine if we may map the memfd AFTER we
invoke shmem_mmap() - the desired logic becomes possible.  This is because
mapping_map_writable() explicitly checks for VM_MAYWRITE, which we will
have cleared.

Commit 5de195060b2e ("mm: resolve faulty mmap_region() error path
behaviour") unintentionally undid this logic by moving the
mapping_map_writable() check before the shmem_mmap() hook is invoked,
thereby regressing this change.

We reinstate this functionality by moving the check out of shmem_mmap()
and instead performing it in do_mmap() at the point at which VMA flags are
being determined, which seems in any case to be a more appropriate place
in which to make this determination.

In order to achieve this we rework memfd seal logic to allow us access to
this information using existing logic and eliminate the clearing of
VM_MAYWRITE from seal_check_write() which we are performing in do_mmap()
instead.

Link: https://lkml.kernel.org/r/99fc35d2c62bd2e05571cf60d9f8b843c56069e0.1732804776.git.lorenzo.stoakes@oracle.com
Fixes: 5de195060b2e ("mm: resolve faulty mmap_region() error path behaviour")
Signed-off-by: Lorenzo Stoakes &lt;lorenzo.stoakes@oracle.com&gt;
Reported-by: Julian Orth &lt;ju.orth@gmail.com&gt;
Closes: https://lore.kernel.org/all/CAHijbEUMhvJTN9Xw1GmbM266FXXv=U7s4L_Jem5x3AaPZxrYpQ@mail.gmail.com/
Cc: Jann Horn &lt;jannh@google.com&gt;
Cc: Liam R. Howlett &lt;Liam.Howlett@Oracle.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Shuah Khan &lt;shuah@kernel.org&gt;
Cc: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
</entry>
</feed>
