blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2022-11-30	mm: add knob /sys/class/bdi/<bdi>/strict_limit	Stefan Roesch	1	-0/+29
	Add a new knob to /sys/class/bdi/<bdi>/strict_limit. This new knob allows to set/unset the flag BDI_CAP_STRICTLIMIT in the bdi capabilities. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Stefan Roesch <[email protected]> Cc: Chris Mason <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm: add bdi_set_strict_limit() function	Stefan Roesch	2	-0/+16
	Patch series "mm/block: add bdi sysfs knobs", v4. At meta network block devices (nbd) are used to implement remote block storage. In testing and during production it has been observed that these network block devices can consume a huge portion of the dirty writeback cache and writeback can take a considerable time. To be able to give stricter limits, I'm proposing the following changes: 1) introduce strictlimit knob Currently the max_ratio knob exists to limit the dirty_memory. However this knob only applies once (dirty_ratio + dirty_background_ratio) / 2 has been reached. With the BDI_CAP_STRICTLIMIT flag, the max_ratio can be applied without reaching that limit. This change exposes that knob. This knob can also be useful for NFS, fuse filesystems and USB devices. 2) Use part of 1000000 internal calculation The max_ratio is based on percentage. With the current machine sizes percentage values can be very high (1% of a 256GB main memory is already 2.5GB). This change uses part of 1000000 instead of percentages for the internal calculations. 3) Introduce two new sysfs knobs: min_bytes and max_bytes. Currently all calculations are based on ratio, but for a user it often more convenient to specify a limit in bytes. The new knobs will not store bytes values, instead they will translate the byte value to a corresponding ratio. As the internal values are now part of 1000, the ratio is closer to the specified value. However the value should be more seen as an approximation as it can fluctuate over time. 3) Introduce two new sysfs knobs: min_ratio_fine and max_ratio_fine. The granularity for the existing sysfs bdi knobs min_ratio and max_ratio is based on percentage values. The new sysfs bdi knobs min_ratio_fine and max_ratio_fine allow to specify the ratio as part of 1 million. This patch (of 20): This adds the bdi_set_strict_limit function to be able to set/unset the BDI_CAP_STRICTLIMIT flag. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Stefan Roesch <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Chris Mason <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	maple_tree: allow TEST_MAPLE_TREE only when DEBUG_KERNEL is set	Randy Dunlap	1	-0/+1
	Prevent a kconfig warning that is caused by TEST_MAPLE_TREE by adding a "depends on" clause for TEST_MAPLE_TREE since 'select' does not follow any kconfig dependencies. WARNING: unmet direct dependencies detected for DEBUG_MAPLE_TREE Depends on [n]: DEBUG_KERNEL [=n] Selected by [y]: - TEST_MAPLE_TREE [=y] && RUNTIME_TESTING_MENU [=y] Link: https://lkml.kernel.org/r/[email protected] Fixes: 120b116208a0 ("maple_tree: reorganize testing to restore module testing") Signed-off-by: Randy Dunlap <[email protected]> Reported-by: Geert Uytterhoeven <[email protected]> Reported-by: kernel test robot <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	Revert "kmsan: unpoison @tlb in arch_tlb_gather_mmu()"	Alexander Potapenko	1	-10/+0
	This reverts commit ac801e7e252c5588325e3c983c7d4167fc68c024. The patch in question was picked to -mm from the KMSAN v6 patch series (https://lore.kernel.org/linux-mm/[email protected]/) and sneaked into mainline despite its removal from the v7 series (https://lore.kernel.org/linux-mm/[email protected]/) Currently KMSAN does not warn about origin chains hitting the maximum depth, so keeping @tlb poisoned won't result in any inconveniences. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Marco Elver <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	folio-compat: remove try_to_release_page()	Vishal Moola (Oracle)	2	-7/+0
	There are no more callers of try_to_release_page(), so remove it. This saves 85 bytes of kernel text. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vishal Moola (Oracle) <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Theodore Ts'o <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	memory-failure: convert truncate_error_page() to use folio	Vishal Moola (Oracle)	1	-2/+3
	Replace try_to_release_page() with filemap_release_folio(). This change is in preparation for the removal of the try_to_release_page() wrapper. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vishal Moola (Oracle) <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Theodore Ts'o <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	khugepage: replace try_to_release_page() with filemap_release_folio()	Vishal Moola (Oracle)	1	-11/+12
	Replace some calls with their folio equivalents. This change removes 4 calls to compound_head() and is in preparation for the removal of the try_to_release_page() wrapper. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vishal Moola (Oracle) <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Theodore Ts'o <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	ext4: convert move_extent_per_page() to use folios	Vishal Moola (Oracle)	1	-21/+31
	Patch series "Removing the try_to_release_page() wrapper", v3. This patchset replaces the remaining calls of try_to_release_page() with the folio equivalent: filemap_release_folio(). This allows us to remove the wrapper. This patch (of 4): Convert move_extent_per_page() to use folios. This change removes 5 calls to compound_head() and is in preparation for the removal of the try_to_release_page() wrapper. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vishal Moola (Oracle) <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Theodore Ts'o <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm/page_alloc: simplify locking during free_unref_page_list	Mel Gorman	1	-16/+9
	While freeing a large list, the zone lock will be released and reacquired to avoid long hold times since commit c24ad77d962c ("mm/page_alloc.c: avoid excessive IRQ disabled times in free_unref_page_list()"). As suggested by Vlastimil Babka, the lockrelease/reacquire logic can be simplified by reusing the logic that acquires a different lock when changing zones. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm/page_alloc: leave IRQs enabled for per-cpu page allocations	Mel Gorman	1	-70/+54
	The pcp_spin_lock_irqsave protecting the PCP lists is IRQ-safe as a task allocating from the PCP must not re-enter the allocator from IRQ context. In each instance where IRQ-reentrancy is possible, the lock is acquired using pcp_spin_trylock_irqsave() even though IRQs are disabled and re-entrancy is impossible. Demote the lock to pcp_spin_lock avoids an IRQ disable/enable in the common case at the cost of some IRQ allocations taking a slower path. If the PCP lists need to be refilled, the zone lock still needs to disable IRQs but that will only happen on PCP refill and drain. If an IRQ is raised when a PCP allocation is in progress, the trylock will fail and fallback to using the buddy lists directly. Note that this may not be a universal win if an interrupt-intensive workload also allocates heavily from interrupt context and contends heavily on the zone->lock as a result. [[email protected]: migratetype might be wrong if a PCP was locked] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: reported lockdep issue on IO completion from softirq] [[email protected]: fix list corruption, lock improvements, micro-optimsations] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm/page_alloc: always remove pages from temporary list	Mel Gorman	1	-0/+2
	Patch series "Leave IRQs enabled for per-cpu page allocations", v3. This patch (of 2): free_unref_page_list() has neglected to remove pages properly from the list of pages to free since forever. It works by coincidence because list_add happened to do the right thing adding the pages to just the PCP lists. However, a later patch added pages to either the PCP list or the zone list but only properly deleted the page from the list in one path leading to list corruption and a subsequent failure. As a preparation patch, always delete the pages from one list properly before adding to another. On its own, this fixes nothing although it adds a fractional amount of overhead but is critical to the next patch. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mel Gorman <[email protected]> Reported-by: Hugh Dickins <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	selftests/vm: use memfd for hugepage-mmap test	Peter Xu	1	-6/+4
	This test was overlooked with a hard-coded mntpoint path in test when we're removing the hugetlb mntpoint in commit 0796c7b8be84. Fix it up so the test can keep running. Link: https://lkml.kernel.org/r/Y3aojfUC2nSwbCzB@x1n Fixes: 0796c7b8be84 ("selftests/vm: drop mnt point for hugetlb in run_vmtests.sh") Signed-off-by: Peter Xu <[email protected]> Reported-by: Joel Savitz <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	zram: remove unused stats fields	Sergey Senozhatsky	2	-4/+0
	We don't show num_reads and num_writes since we removed corresponding sysfs nodes in 2017. Block layer stats are exposed via /sys/block/zramX/stat file. However, we still increment those atomic vars and store them in zram stats. Remove leftovers. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sergey Senozhatsky <[email protected]> Acked-by: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm/migrate.c: stop using 0 as NULL pointer	Yang Li	1	-1/+1
	mm/migrate.c:1198:24: warning: Using plain integer as NULL pointer Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3080 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Li <[email protected]> Reported-by: Abaci Robot <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm: multi-gen LRU: remove NULL checks on NODE_DATA()	Yu Zhao	1	-11/+2
	NODE_DATA() is preallocated for all possible nodes after commit 09f49dca570a ("mm: handle uninitialized numa nodes gracefully"). Checking its return value against NULL is now unnecessary. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm/gup: disallow FOLL_FORCE\|FOLL_WRITE on hugetlb mappings	David Hildenbrand	1	-0/+3
	hugetlb does not support fake write-faults (write faults without write permissions). However, we are currently able to trigger a FAULT_FLAG_WRITE fault on a VMA without VM_WRITE. If we'd ever want to support FOLL_FORCE\|FOLL_WRITE, we'd have to teach hugetlb to: (1) Leave the page mapped R/O after the fake write-fault, like maybe_mkwrite() does. (2) Allow writing to an exclusive anon page that's mapped R/O when FOLL_FORCE is set, like can_follow_write_pte(). E.g., __follow_hugetlb_must_fault() needs adjustment. For now, it's not clear if that added complexity is really required. History tolds us that FOLL_FORCE is dangerous and that we better limit its use to a bare minimum. -------------------------------------------------------------------------- #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <unistd.h> #include <errno.h> #include <stdint.h> #include <sys/mman.h> #include <linux/mman.h> int main(int argc, char *argv) { char map; int mem_fd; map = mmap(NULL, 2 * 1024 * 1024u, PROT_READ, MAP_PRIVATE\|MAP_ANON\|MAP_HUGETLB\|MAP_HUGE_2MB, -1, 0); if (map == MAP_FAILED) { fprintf(stderr, "mmap() failed: %d\n", errno); return 1; } mem_fd = open("/proc/self/mem", O_RDWR); if (mem_fd < 0) { fprintf(stderr, "open(/proc/self/mem) failed: %d\n", errno); return 1; } if (pwrite(mem_fd, "0", 1, (uintptr_t) map) == 1) { fprintf(stderr, "write() succeeded, which is unexpected\n"); return 1; } printf("write() failed as expected: %d\n", errno); return 0; } -------------------------------------------------------------------------- Fortunately, we have a sanity check in hugetlb_wp() in place ever since commit 1d8d14641fd9 ("mm/hugetlb: support write-faults in shared mappings"), that bails out instead of silently mapping a page writable in a !PROT_WRITE VMA. Consequently, above reproducer triggers a warning, similar to the one reported by szsbot: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 3612 at mm/hugetlb.c:5313 hugetlb_wp+0x20a/0x1af0 mm/hugetlb.c:5313 Modules linked in: CPU: 1 PID: 3612 Comm: syz-executor250 Not tainted 6.1.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022 RIP: 0010:hugetlb_wp+0x20a/0x1af0 mm/hugetlb.c:5313 Code: ea 03 80 3c 02 00 0f 85 31 14 00 00 49 8b 5f 20 31 ff 48 89 dd 83 e5 02 48 89 ee e8 70 ab b7 ff 48 85 ed 75 5b e8 76 ae b7 ff <0f> 0b 41 bd 40 00 00 00 e8 69 ae b7 ff 48 b8 00 00 00 00 00 fc ff RSP: 0018:ffffc90003caf620 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000008640070 RCX: 0000000000000000 RDX: ffff88807b963a80 RSI: ffffffff81c4ed2a RDI: 0000000000000007 RBP: 0000000000000000 R08: 0000000000000007 R09: 0000000000000000 R10: 0000000000000000 R11: 000000000008c07e R12: ffff888023805800 R13: 0000000000000000 R14: ffffffff91217f38 R15: ffff88801d4b0360 FS: 0000555555bba300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff7a47a1b8 CR3: 000000002378d000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> hugetlb_no_page mm/hugetlb.c:5755 [inline] hugetlb_fault+0x19cc/0x2060 mm/hugetlb.c:5874 follow_hugetlb_page+0x3f3/0x1850 mm/hugetlb.c:6301 __get_user_pages+0x2cb/0xf10 mm/gup.c:1202 __get_user_pages_locked mm/gup.c:1434 [inline] __get_user_pages_remote+0x18f/0x830 mm/gup.c:2187 get_user_pages_remote+0x84/0xc0 mm/gup.c:2260 __access_remote_vm+0x287/0x6b0 mm/memory.c:5517 ptrace_access_vm+0x181/0x1d0 kernel/ptrace.c:61 generic_ptrace_pokedata kernel/ptrace.c:1323 [inline] ptrace_request+0xb46/0x10c0 kernel/ptrace.c:1046 arch_ptrace+0x36/0x510 arch/x86/kernel/ptrace.c:828 __do_sys_ptrace kernel/ptrace.c:1296 [inline] __se_sys_ptrace kernel/ptrace.c:1269 [inline] __x64_sys_ptrace+0x178/0x2a0 kernel/ptrace.c:1269 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] So let's silence that warning by teaching GUP code that FOLL_FORCE -- so far -- does not apply to hugetlb. Note that FOLL_FORCE for read-access seems to be working as expected. The assumption is that this has been broken forever, only ever since above commit, we actually detect the wrong handling and WARN_ON_ONCE(). I assume this has been broken at least since 2014, when mm/gup.c came to life. I failed to come up with a suitable Fixes tag quickly. Link: https://lkml.kernel.org/r/[email protected] Fixes: 1d8d14641fd9 ("mm/hugetlb: support write-faults in shared mappings") Signed-off-by: David Hildenbrand <[email protected]> Reported-by: <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Peter Xu <[email protected]> Cc: John Hubbard <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	habanalabs: remove FOLL_FORCE usage	David Hildenbrand	1	-2/+1
	FOLL_FORCE is really only for ptrace access. As we unpin the pinned pages using unpin_user_pages_dirty_lock(true), the assumption is that all these pages are writable. FOLL_FORCE in this case seems to be due to copy-and-past from other drivers. Let's just remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Oded Gabbay <[email protected]> Cc: Oded Gabbay <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	RDMA/hw/qib/qib_user_pages: remove FOLL_FORCE usage	David Hildenbrand	1	-1/+1
	FOLL_FORCE is really only for ptrace access. As we unpin the pinned pages using unpin_user_pages_dirty_lock(true), the assumption is that all these pages are writable. FOLL_FORCE in this case seems to be a legacy leftover. Let's just remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Dennis Dalessandro <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Leon Romanovsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	drm/exynos: remove FOLL_FORCE usage	David Hildenbrand	1	-1/+1
	FOLL_FORCE is really only for ptrace access. As we unpin the pinned pages using unpin_user_pages_dirty_lock(true), the assumption is that all these pages are writable. FOLL_FORCE in this case seems to be a legacy leftover. Let's just remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Cc: Inki Dae <[email protected]> Cc: Seung-Woo Kim <[email protected]> Cc: Kyungmin Park <[email protected]> Cc: David Airlie <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm/frame-vector: remove FOLL_FORCE usage	David Hildenbrand	1	-1/+1
	FOLL_FORCE is really only for ptrace access. According to commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always writable"), get_vaddr_frames() currently pins all pages writable as a workaround for issues with read-only buffers. FOLL_FORCE, however, seems to be a legacy leftover as it predates commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always writable"). Let's just remove it. Once the read-only buffer issue has been resolved, FOLL_WRITE could again be set depending on the DMA direction. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Acked-by: Hans Verkuil <[email protected]> Acked-by: Tomasz Figa <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	media: pci/ivtv: remove FOLL_FORCE usage	David Hildenbrand	2	-4/+3
	FOLL_FORCE is really only for ptrace access. R/O pinning a page is supposed to fail if the VMA misses proper access permissions (no VM_READ). Let's just remove FOLL_FORCE usage here; there would have to be a pretty good reason to allow arbitrary drivers to R/O pin pages in a PROT_NONE VMA. Most probably, FOLL_FORCE usage is just some legacy leftover. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Hans Verkuil <[email protected]> Cc: Andy Walls <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	drm/etnaviv: remove FOLL_FORCE usage	David Hildenbrand	1	-3/+5
	GUP now supports reliable R/O long-term pinning in COW mappings, such that we break COW early. MAP_SHARED VMAs only use the shared zeropage so far in one corner case (DAXFS file with holes), which can be ignored because GUP does not support long-term pinning in fsdax (see check_vma_flags()). commit cd5297b0855f ("drm/etnaviv: Use FOLL_FORCE for userptr") documents that FOLL_FORCE \| FOLL_WRITE was really only used for reliable R/O pinning. Consequently, FOLL_FORCE \| FOLL_WRITE \| FOLL_LONGTERM is no longer required for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop using FOLL_FORCE, which is really only for ptrace access. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Cc: Lucas Stach <[email protected]> Cc: Russell King <[email protected]> Cc: Christian Gmeiner <[email protected]> Cc: David Airlie <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	media: videobuf-dma-sg: remove FOLL_FORCE usage	David Hildenbrand	1	-9/+5
	GUP now supports reliable R/O long-term pinning in COW mappings, such that we break COW early. MAP_SHARED VMAs only use the shared zeropage so far in one corner case (DAXFS file with holes), which can be ignored because GUP does not support long-term pinning in fsdax (see check_vma_flags()). Consequently, FOLL_FORCE \| FOLL_WRITE \| FOLL_LONGTERM is no longer required for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop using FOLL_FORCE, which is really only for ptrace access. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Acked-by: Hans Verkuil <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	RDMA/siw: remove FOLL_FORCE usage	David Hildenbrand	1	-5/+4
	GUP now supports reliable R/O long-term pinning in COW mappings, such that we break COW early. MAP_SHARED VMAs only use the shared zeropage so far in one corner case (DAXFS file with holes), which can be ignored because GUP does not support long-term pinning in fsdax (see check_vma_flags()). Consequently, FOLL_FORCE \| FOLL_WRITE \| FOLL_LONGTERM is no longer required for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop using FOLL_FORCE, which is really only for ptrace access. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Bernard Metzler <[email protected]> Cc: Leon Romanovsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	RDMA/usnic: remove FOLL_FORCE usage	David Hildenbrand	1	-5/+4
	GUP now supports reliable R/O long-term pinning in COW mappings, such that we break COW early. MAP_SHARED VMAs only use the shared zeropage so far in one corner case (DAXFS file with holes), which can be ignored because GUP does not support long-term pinning in fsdax (see check_vma_flags()). Consequently, FOLL_FORCE \| FOLL_WRITE \| FOLL_LONGTERM is no longer required for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop using FOLL_FORCE, which is really only for ptrace access. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Christian Benvenuti <[email protected]> Cc: Nelson Escobar <[email protected]> Cc: Leon Romanovsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	RDMA/umem: remove FOLL_FORCE usage	David Hildenbrand	1	-4/+4
	GUP now supports reliable R/O long-term pinning in COW mappings, such that we break COW early. MAP_SHARED VMAs only use the shared zeropage so far in one corner case (DAXFS file with holes), which can be ignored because GUP does not support long-term pinning in fsdax (see check_vma_flags()). Consequently, FOLL_FORCE \| FOLL_WRITE \| FOLL_LONGTERM is no longer required for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop using FOLL_FORCE, which is really only for ptrace access. Link: https://lkml.kernel.org/r/[email protected] Tested-by: Leon Romanovsky <[email protected]> [over mlx4 and mlx5] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Leon Romanovsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm/gup: reliable R/O long-term pinning in COW mappings	David Hildenbrand	4	-12/+34
	We already support reliable R/O pinning of anonymous memory. However, assume we end up pinning (R/O long-term) a pagecache page or the shared zeropage inside a writable private ("COW") mapping. The next write access will trigger a write-fault and replace the pinned page by an exclusive anonymous page in the process page tables to break COW: the pinned page no longer corresponds to the page mapped into the process' page table. Now that FAULT_FLAG_UNSHARE can break COW on anything mapped into a COW mapping, let's properly break COW first before R/O long-term pinning something that's not an exclusive anon page inside a COW mapping. FAULT_FLAG_UNSHARE will break COW and map an exclusive anon page instead that can get pinned safely. With this change, we can stop using FOLL_FORCE\|FOLL_WRITE for reliable R/O long-term pinning in COW mappings. With this change, the new R/O long-term pinning tests for non-anonymous memory succeed: # [RUN] R/O longterm GUP pin ... with shared zeropage ok 151 Longterm R/O pin is reliable # [RUN] R/O longterm GUP pin ... with memfd ok 152 Longterm R/O pin is reliable # [RUN] R/O longterm GUP pin ... with tmpfile ok 153 Longterm R/O pin is reliable # [RUN] R/O longterm GUP pin ... with huge zeropage ok 154 Longterm R/O pin is reliable # [RUN] R/O longterm GUP pin ... with memfd hugetlb (2048 kB) ok 155 Longterm R/O pin is reliable # [RUN] R/O longterm GUP pin ... with memfd hugetlb (1048576 kB) ok 156 Longterm R/O pin is reliable # [RUN] R/O longterm GUP-fast pin ... with shared zeropage ok 157 Longterm R/O pin is reliable # [RUN] R/O longterm GUP-fast pin ... with memfd ok 158 Longterm R/O pin is reliable # [RUN] R/O longterm GUP-fast pin ... with tmpfile ok 159 Longterm R/O pin is reliable # [RUN] R/O longterm GUP-fast pin ... with huge zeropage ok 160 Longterm R/O pin is reliable # [RUN] R/O longterm GUP-fast pin ... with memfd hugetlb (2048 kB) ok 161 Longterm R/O pin is reliable # [RUN] R/O longterm GUP-fast pin ... with memfd hugetlb (1048576 kB) ok 162 Longterm R/O pin is reliable Note 1: We don't care about short-term R/O-pinning, because they have snapshot semantics: they are not supposed to observe modifications that happen after pinning. As one example, assume we start direct I/O to read from a page and store page content into a file: modifications to page content after starting direct I/O are not guaranteed to end up in the file. So even if we'd pin the shared zeropage, the end result would be as expected -- getting zeroes stored to the file. Note 2: For shared mappings we'll now always fallback to the slow path to lookup the VMA when R/O long-term pining. While that's the necessary price we have to pay right now, it's actually not that bad in practice: most FOLL_LONGTERM users already specify FOLL_WRITE, for example, along with FOLL_FORCE because they tried dealing with COW mappings correctly ... Note 3: For users that use FOLL_LONGTERM right now without FOLL_WRITE, such as VFIO, we'd now no longer pin the shared zeropage. Instead, we'd populate exclusive anon pages that we can pin. There was a concern that this could affect the memlock limit of existing setups. For example, a VM running with VFIO could run into the memlock limit and fail to run. However, we essentially had the same behavior already in commit 17839856fd58 ("gup: document and work around "COW can break either way" issue") which got merged into some enterprise distros, and there were not any such complaints. So most probably, we're fine. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Daniel Vetter <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Reviewed-by: John Hubbard <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm: extend FAULT_FLAG_UNSHARE support to anything in a COW mapping	David Hildenbrand	2	-8/+4
	Extend FAULT_FLAG_UNSHARE to break COW on anything mapped into a COW (i.e., private writable) mapping and adjust the documentation accordingly. FAULT_FLAG_UNSHARE will now also break COW when encountering the shared zeropage, a pagecache page, a PFNMAP, ... inside a COW mapping, by properly replacing the mapped page/pfn by a private copy (an exclusive anonymous page). Note that only do_wp_page() needs care: hugetlb_wp() already handles FAULT_FLAG_UNSHARE correctly. wp_huge_pmd()/wp_huge_pud() also handles it correctly, for example, splitting the huge zeropage on FAULT_FLAG_UNSHARE such that we can handle FAULT_FLAG_UNSHARE on the PTE level. This change is a requirement for reliable long-term R/O pinning in COW mappings. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm: don't call vm_ops->huge_fault() in wp_huge_pmd()/wp_huge_pud() for ↵	David Hildenbrand	1	-9/+15
	private mappings If we already have a PMD/PUD mapped write-protected in a private mapping and we want to break COW either due to FAULT_FLAG_WRITE or FAULT_FLAG_UNSHARE, there is no need to inform the file system just like on the PTE path. Let's just split (->zap) + fallback in that case. This is a preparation for more generic FAULT_FLAG_UNSHARE support in COW mappings. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm: rework handling in do_wp_page() based on private vs. shared mappings	David Hildenbrand	1	-21/+17
	We want to extent FAULT_FLAG_UNSHARE support to anything mapped into a COW mapping (pagecache page, zeropage, PFN, ...), not just anonymous pages. Let's prepare for that by handling shared mappings first such that we can handle private mappings last. While at it, use folio-based functions instead of page-based functions where we touch the code either way. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm: add early FAULT_FLAG_WRITE consistency checks	David Hildenbrand	1	-0/+8
	Let's catch abuse of FAULT_FLAG_WRITE early, such that we don't have to care in all other handlers and might get "surprises" if we forget to do so. Write faults without VM_MAYWRITE don't make any sense, and our maybe_mkwrite() logic could have hidden such abuse for now. Write faults without VM_WRITE on something that is not a COW mapping is similarly broken, and e.g., do_wp_page() could end up placing an anonymous page into a shared mapping, which would be bad. This is a preparation for reliable R/O long-term pinning of pages in private mappings, whereby we want to make sure that we will never break COW in a read-only private mapping. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm: add early FAULT_FLAG_UNSHARE consistency checks	David Hildenbrand	3	-11/+20
	For now, FAULT_FLAG_UNSHARE only applies to anonymous pages, which implies a COW mapping. Let's hide FAULT_FLAG_UNSHARE early if we're not dealing with a COW mapping, such that we treat it like a read fault as documented and don't have to worry about the flag throughout all fault handlers. While at it, centralize the check for mutual exclusion of FAULT_FLAG_UNSHARE and FAULT_FLAG_WRITE and just drop the check that either flag is set in the WP handler. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	selftests/vm: cow: R/O long-term pinning reliability tests for non-anon pages	David Hildenbrand	1	-1/+27
	Let's test whether R/O long-term pinning is reliable for non-anonymous memory: when R/O long-term pinning a page, the expectation is that we break COW early before pinning, such that actual write access via the page tables won't break COW later and end up replacing the R/O-pinned page in the page table. Consequently, R/O long-term pinning in private mappings would only target exclusive anonymous pages. For now, all tests fail: # [RUN] R/O longterm GUP pin ... with shared zeropage not ok 151 Longterm R/O pin is reliable # [RUN] R/O longterm GUP pin ... with memfd not ok 152 Longterm R/O pin is reliable # [RUN] R/O longterm GUP pin ... with tmpfile not ok 153 Longterm R/O pin is reliable # [RUN] R/O longterm GUP pin ... with huge zeropage not ok 154 Longterm R/O pin is reliable # [RUN] R/O longterm GUP pin ... with memfd hugetlb (2048 kB) not ok 155 Longterm R/O pin is reliable # [RUN] R/O longterm GUP pin ... with memfd hugetlb (1048576 kB) not ok 156 Longterm R/O pin is reliable # [RUN] R/O longterm GUP-fast pin ... with shared zeropage not ok 157 Longterm R/O pin is reliable # [RUN] R/O longterm GUP-fast pin ... with memfd not ok 158 Longterm R/O pin is reliable # [RUN] R/O longterm GUP-fast pin ... with tmpfile not ok 159 Longterm R/O pin is reliable # [RUN] R/O longterm GUP-fast pin ... with huge zeropage not ok 160 Longterm R/O pin is reliable # [RUN] R/O longterm GUP-fast pin ... with memfd hugetlb (2048 kB) not ok 161 Longterm R/O pin is reliable # [RUN] R/O longterm GUP-fast pin ... with memfd hugetlb (1048576 kB) not ok 162 Longterm R/O pin is reliable Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	selftests/vm: cow: basic COW tests for non-anonymous pages	David Hildenbrand	1	-1/+337
	Let's add basic tests for COW with non-anonymous pages in private mappings: write access should properly trigger COW and result in the private changes not being visible through other page mappings. Especially, add tests for: * Zeropage * Huge zeropage * Ordinary pagecache pages via memfd and tmpfile() * Hugetlb pages via memfd Fortunately, all tests pass. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	selftests/vm: anon_cow: prepare for non-anonymous COW tests	David Hildenbrand	5	-19/+24
	Patch series "mm/gup: remove FOLL_FORCE usage from drivers (reliable R/O long-term pinning)". For now, we did not support reliable R/O long-term pinning in COW mappings. That means, if we would trigger R/O long-term pinning in MAP_PRIVATE mapping, we could end up pinning the (R/O-mapped) shared zeropage or a pagecache page. The next write access would trigger a write fault and replace the pinned page by an exclusive anonymous page in the process page table; whatever the process would write to that private page copy would not be visible by the owner of the previous page pin: for example, RDMA could read stale data. The end result is essentially an unexpected and hard-to-debug memory corruption. Some drivers tried working around that limitation by using "FOLL_FORCE\|FOLL_WRITE\|FOLL_LONGTERM" for R/O long-term pinning for now. FOLL_WRITE would trigger a write fault, if required, and break COW before pinning the page. FOLL_FORCE is required because the VMA might lack write permissions, and drivers wanted to make that working as well, just like one would expect (no write access, but still triggering a write access to break COW). However, that is not a practical solution, because (1) Drivers that don't stick to that undocumented and debatable pattern would still run into that issue. For example, VFIO only uses FOLL_LONGTERM for R/O long-term pinning. (2) Using FOLL_WRITE just to work around a COW mapping + page pinning limitation is unintuitive. FOLL_WRITE would, for example, mark the page softdirty or trigger uffd-wp, even though, there actually isn't going to be any write access. (3) The purpose of FOLL_FORCE is debug access, not access without lack of VMA permissions by arbitrarty drivers. So instead, make R/O long-term pinning work as expected, by breaking COW in a COW mapping early, such that we can remove any FOLL_FORCE usage from drivers and make FOLL_FORCE ptrace-specific (renaming it to FOLL_PTRACE). More details in patch #8. This patch (of 19): Originally, the plan was to have a separate tests for testing COW of non-anonymous (e.g., shared zeropage) pages. Turns out, that we'd need a lot of similar functionality and that there isn't a really good reason to separate it. So let's prepare for non-anon tests by renaming to "cow". Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andy Walls <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Bernard Metzler <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Benvenuti <[email protected]> Cc: Christian Gmeiner <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Airlie <[email protected]> Cc: David S. Miller <[email protected]> Cc: Dennis Dalessandro <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Inki Dae <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: James Morris <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Johannes Berg <[email protected]> Cc: John Hubbard <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kentaro Takeda <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Cc: Kyungmin Park <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Lucas Stach <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Nelson Escobar <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oded Gabbay <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul Moore <[email protected]> Cc: Peter Xu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Seung-Woo Kim <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tomasz Figa <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm: Kconfig: make config SECRETMEM visible with EXPERT	Lukas Bulwahn	1	-1/+7
	Commit 6a108a14fa35 ("kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT") introduces CONFIG_EXPERT to carry the previous intent of CONFIG_EMBEDDED and just gives that intent a much better name. That has been clearly a good and long overdue renaming, and it is clearly an improvement to the kernel build configuration that has shown to help managing the kernel build configuration in the last decade. However, rather than bravely and radically just deleting CONFIG_EMBEDDED, this commit gives CONFIG_EMBEDDED a new intended semantics, but keeps it open for future contributors to implement that intended semantics: A new CONFIG_EMBEDDED option is added that automatically selects CONFIG_EXPERT when enabled and can be used in the future to isolate options that should only be considered for embedded systems (RISC architectures, SLOB, etc). Since then, this CONFIG_EMBEDDED implicitly had two purposes: - It can make even more options visible beyond what CONFIG_EXPERT makes visible. In other words, it may introduce another level of enabling the visibility of configuration options: always visible, visible with CONFIG_EXPERT and visible with CONFIG_EMBEDDED. - Set certain default values of some configurations differently, following the assumption that configuring a kernel build for an embedded system generally starts with a different set of default values compared to kernel builds for all other kind of systems. Considering the second purpose, note that already probably arguing that a kernel build for an embedded system would choose some values differently is already tricky: the set of embedded systems with Linux kernels is already quite diverse. Many embedded system have powerful CPUs and it would not be clear that all embedded systems just optimize towards one specific aspect, e.g., a smaller kernel image size. So, it is unclear if starting with "one set of default configuration" that is induced by CONFIG_EMBEDDED is a good offer for developers configuring their kernels. Also, the differences of needed user-space features in an embedded system compared to a non-embedded system are probably difficult or even impossible to name in some generic way. So it is not surprising that in the last decade hardly anyone has contributed changes to make something default differently in case of CONFIG_EMBEDDED=y. Currently, in v6.0-rc4, SECRETMEM is the only config switched off if CONFIG_EMBEDDED=y. As long as that is actually the only option that currently is selected or deselected, it is better to just make SECRETMEM configurable at build time by experts using menuconfig instead. Make SECRETMEM configurable when EXPERT is set and otherwise default to yes. Further, SECRETMEM needs ARCH_HAS_SET_DIRECT_MAP. This allows us to remove CONFIG_EMBEDDED in the close future. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Lukas Bulwahn <[email protected]> Acked-by: Mike Rapoport <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Reviewed-by: Masahiro Yamada <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm/gup: remove the restriction on locked with FOLL_LONGTERM	Jason Gunthorpe	1	-82/+27
	This restriction was created because FOLL_LONGTERM used to scan the vma list, so it could not tolerate becoming unlocked. That was fixed in commit 52650c8b466b ("mm/gup: remove the vma allocation from gup_longterm_locked()") and the restriction on !vma was removed. However, the locked restriction remained, even though it isn't necessary anymore. Adjust __gup_longterm_locked() so it can handle the mmap_read_lock() becoming unlocked while it is looping for migration. Migration does not require the mmap_read_sem because it is only handling struct pages. If we had to unlock then ensure the whole thing returns unlocked. Remove __get_user_pages_remote() and __gup_longterm_unlocked(). These cases can now just directly call other functions. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jason Gunthorpe <[email protected]> Reviewed-by: John Hubbard <[email protected]> Cc: Alistair Popple <[email protected]> Cc: John Hubbard <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	selftests/damon: fix unnecessary compilation warnings	Rong Tao	1	-0/+9
	When testing overflow and overread, there is no need to keep unnecessary compilation warnings, we should simply ignore them. The motivation for this patch is to eliminate the compilation warning, maybe one day we will compile the kernel with "-Werror -Wall", at which point this compilation warning will turn into a compilation error, we should fix this error in advance. How to reproduce the problem (with gcc-11.3.1): $ make -C tools/testing/selftests/ ... warning: `write' reading 4294967295 bytes from a region of size 1 [-Wstringop-overread] warning: `read' writing 4294967295 bytes into a region of size 25 overflows the destination [-Wstringop-overflow=] "-Wno-stringop-overread" is supported at least in gcc-11.1.0. Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d14c547abd484d3540b692bb8048c4a6efe92c8b Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Rong Tao <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	hugetlbfs: inode: remove unnecessary (void*) conversions	Li zeming	1	-1/+1
	The ei pointer does not need to cast the type. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Li zeming <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm: make drop_caches keep reclaiming on all nodes	Jan Kara	1	-15/+18
	Currently, drop_caches are reclaiming node-by-node, looping on each node until reclaim could not make progress. This can however leave quite some slab entries (such as filesystem inodes) unreclaimed if objects say on node 1 keep objects on node 0 pinned. So move the "loop until no progress" loop to the node-by-node iteration to retry reclaim also on other nodes if reclaim on some nodes made progress. This fixes problem when drop_caches was not reclaiming lots of otherwise perfectly fine to reclaim inodes. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jan Kara <[email protected]> Reported-by: You Zhou <[email protected]> Reported-by: Pengfei Xu <[email protected]> Tested-by: Pengfei Xu <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm: anonymous shared memory naming	Pasha Tatashin	6	-30/+57
	Since commit 9a10064f5625 ("mm: add a field to store names for private anonymous memory"), name for private anonymous memory, but not shared anonymous, can be set. However, naming shared anonymous memory just as useful for tracking purposes. Extend the functionality to be able to set names for shared anon. There are two ways to create anonymous shared memory, using memfd or directly via mmap(): 1. fd = memfd_create(...) mem = mmap(..., MAP_SHARED, fd, ...) 2. mem = mmap(..., MAP_SHARED \| MAP_ANONYMOUS, -1, ...) In both cases the anonymous shared memory is created the same way by mapping an unlinked file on tmpfs. The memfd way allows to give a name for anonymous shared memory, but not useful when parts of shared memory require to have distinct names. Example use case: The VMM maps VM memory as anonymous shared memory (not private because VMM is sandboxed and drivers are running in their own processes). However, the VM tells back to the VMM how parts of the memory are actually used by the guest, how each of the segments should be backed (i.e. 4K pages, 2M pages), and some other information about the segments. The naming allows us to monitor the effective memory footprint for each of these segments from the host without looking inside the guest. Sample output: /* Create shared anonymous segmenet / anon_shmem = mmap(NULL, SIZE, PROT_READ \| PROT_WRITE, MAP_SHARED \| MAP_ANONYMOUS, -1, 0); / Name the segment: "MY-NAME" */ rv = prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, anon_shmem, SIZE, "MY-NAME"); cat /proc/<pid>/maps (and smaps): 7fc8e2b4c000-7fc8f2b4c000 rw-s 00000000 00:01 1024 [anon_shmem:MY-NAME] If the segment is not named, the output is: 7fc8e2b4c000-7fc8f2b4c000 rw-s 00000000 00:01 1024 /dev/zero (deleted) Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pasha Tatashin <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Bagas Sanjaya <[email protected]> Cc: Colin Cross <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Liam Howlett <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Peter Xu <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Vincent Whitchurch <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: xu xin <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm: shrinkers: add missing includes for undeclared types	T.J. Mercier	1	-0/+3
	The shrinker.h header depends on a user including other headers before it for types used by shrinker.h. Fix this by including the appropriate headers in shrinker.h. ./include/linux/shrinker.h:13:9: error: unknown type name `gfp_t' 13 \| gfp_t gfp_mask; \| ^~~~~ ./include/linux/shrinker.h:71:26: error: field `list' has incomplete type 71 \| struct list_head list; \| ^~~~ ./include/linux/shrinker.h:82:9: error: unknown type name `atomic_long_t' 82 \| atomic_long_t *nr_deferred; \| Link: https://lkml.kernel.org/r/[email protected] Fixes: 83aeeada7c69 ("vmscan: use atomic-long for shrinker batching") Fixes: b0d40c92adaf ("superblock: introduce per-sb cache shrinker infrastructure") Signed-off-by: T.J. Mercier <[email protected]> Cc: Al Viro <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	hugetlb: remove duplicate mmu notifications	Mike Kravetz	1	-9/+9
	The common hugetlb unmap routine __unmap_hugepage_range performs mmu notification calls. However, in the case where __unmap_hugepage_range is called via __unmap_hugepage_range_final, mmu notification calls are performed earlier in other calling routines. Remove mmu notification calls from __unmap_hugepage_range. Add notification calls to the only other caller: unmap_hugepage_range. unmap_hugepage_range is called for truncation and hole punch, so change notification type from UNMAP to CLEAR as this is more appropriate. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Kravetz <[email protected]> Suggested-by: Peter Xu <[email protected]> Cc: Wei Chen <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm/uffd: sanity check write bit for uffd-wp protected ptes	Peter Xu	1	-1/+17
	Let's add one sanity check for CONFIG_DEBUG_VM on the write bit in whatever chance we have when walking through the pgtables. It can bring the error earlier even before the app notices the data was corrupted on the snapshot. Also it helps us to identify this is a wrong pgtable setup, so hopefully a great information to have for debugging too. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nadav Amit <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm/kmemleak.c: fix a comment	Yixuan Cao	1	-1/+1
	I noticed a typo in a code comment and I fixed it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yixuan Cao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	docs: admin-guide: cgroup-v1: update description of inactive_file	Jian Wen	1	-1/+2
	MADV_FREE pages have been moved into the LRU_INACTIVE_FILE list by commit f7ad2a6cb9f7 ("mm: move MADV_FREE pages into LRU_INACTIVE_FILE list"). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jian Wen <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	mm/demotion: fix NULL vs IS_ERR checking in memory_tier_init	Miaoqian Lin	1	-1/+1
	alloc_memory_type() returns error pointers on error instead of NULL. Use IS_ERR() to check the return value to fix this. Link: https://lkml.kernel.org/r/[email protected] Fixes: 7b88bda3761b ("mm/demotion/dax/kmem: set node's abstract distance to MEMTIER_DEFAULT_DAX_ADISTANCE") Signed-off-by: Miaoqian Lin <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Wei Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	migrate: convert migrate_pages() to use folios	Huang Ying	1	-98/+112
	Quite straightforward, the page functions are converted to corresponding folio functions. Same for comments. THP specific code are converted to be large folio. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Tested-by: Baolin Wang <[email protected]> Cc: Zi Yan <[email protected]> Cc: Yang Shi <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	migrate: convert unmap_and_move() to use folios	Huang Ying	1	-27/+27
	Patch series "migrate: convert migrate_pages()/unmap_and_move() to use folios", v2. The conversion is quite straightforward, just replace the page API to the corresponding folio API. migrate_pages() and unmap_and_move() mostly work with folios (head pages) only. This patch (of 2): Quite straightforward, the page functions are converted to corresponding folio functions. Same for comments. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Reviewed-by: Yang Shi <[email protected]> Reviewed-by: Zi Yan <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-30	Revert "mm: migration: fix the FOLL_GET failure on following huge page"	Baolin Wang	1	-8/+2
	Revert commit 831568214883 ("mm: migration: fix the FOLL_GET failure on following huge page"), since after commit 1a6baaa0db73 ("s390/hugetlb: switch to generic version of follow_huge_pud()") and commit 57a196a58421 ("hugetlb: simplify hugetlb handling in follow_page_mask") were merged, now all the following huge page routines can support FOLL_GET operation. Link: https://lkml.kernel.org/r/496786039852aba90ffa68f10d0df3f4236a990b.1667983080.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <[email protected]> Acked-by: Haiyue Wang <[email protected]> Cc: Baolin Wang <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>