blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2024-02-21	memcg: use a folio in get_mctgt_type	Matthew Wilcox (Oracle)	1	-10/+13
	Replace seven calls to compound_head() with one. We still use the page as page_mapped() is different from folio_mapped(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Johannes Weiner <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-21	memcg: return the folio in union mc_target	Matthew Wilcox (Oracle)	1	-7/+7
	All users of target.page convert it to the folio, so we can just return the folio directly and save a few calls to compound_head(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Johannes Weiner <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-21	memcg: convert mem_cgroup_move_charge_pte_range() to use a folio	Matthew Wilcox (Oracle)	1	-25/+24
	Patch series "Convert memcontrol charge moving to use folios". No part of these patches should change behaviour; all the called functions already convert from page to folio, so this ought to simply be a reduction in the number of calls to compound_head(). This patch (of 4): Remove many calls to compound_head() by calling page_folio() once at the start of each stanza which receives a struct page from 'target'. There should be no change in behaviour here as all the called functions start out by converting the page to its folio. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Johannes Weiner <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-21	mm: mmap: no need to call khugepaged_enter_vma() for stack	Yang Shi	1	-2/+0
	We avoid allocating THP for temporary stack, even though khugepaged_enter_vma() is called for stack VMAs, it actualy returns false. So no need to call it in the first place at all. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Shi <[email protected]> Reviewed-by: Yin Fengwei <[email protected]> Cc: Christopher Lameter <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Rik van Riel <[email protected]> Cc: kernel test robot <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-21	mm: list_lru: disable memcg_aware when cgroup.memory is set to "nokmem"	Haifeng Xu	1	-0/+3
	Actually, when using a boot time kernel option "cgroup.memory=nokmem", all lru items are inserted to list_lru_node. But for those users who invoke list_lru_init_memcg() to initialize list_lru, list_lru_memcg_aware() returns true. And this brings unneeded operations related to memcg. To make things more convenient, let's disable memcg_aware when cgroup.memory is set to "nokmem". Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Haifeng Xu <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-21	mm: memory: use nth_page() in clear/copy_subpage()	Kefeng Wang	1	-4/+5
	The clear and copy of huge gigantic page has converted to use nth_page() to handle the possible discontinuous struct page(SPARSEMEM without VMEMMAP), but not change for the non-gigantic part, fix it too. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Zi Yan <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-21	mm/mmap: simplify vma link and unlink	Yajun Deng	1	-25/+19
	The file parameter in the __remove_shared_vm_struct is no longer used, remove it. These functions vma_link() and mmap_region() have some of the same code, introduce vma_link_file() helper function to simplify the code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yajun Deng <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-21	mm/filemap: avoid type conversion	Hongbo Li	1	-1/+1
	The return type of function folio_test_hugetlb is bool type, there is no need to assign it to an integer type. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hongbo Li <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-21	mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers	Sumanth Korikkar	2	-4/+16
	Patch series "implement "memmap on memory" feature on s390". This series provides "memmap on memory" support on s390 platform. "memmap on memory" allows struct pages array to be allocated from the hotplugged memory range instead of allocating it from main system memory. s390 currently preallocates struct pages array for all potentially possible memory, which ensures memory onlining always succeeds, but with the cost of significant memory consumption from the available system memory during boottime. In certain extreme configuration, this could lead to ipl failure. "memmap on memory" ensures struct pages array are populated from self contained hotplugged memory range instead of depleting the available system memory and this could eliminate ipl failure on s390 platform. On other platforms, system might go OOM when the physically hotplugged memory depletes the available memory before it is onlined. Hence, "memmap on memory" feature was introduced as described in commit a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range"). Unlike other architectures, s390 memory blocks are not physically accessible until it is online. To make it physically accessible two new memory notifiers MEM_PREPARE_ONLINE / MEM_FINISH_OFFLINE are added and this notifier lets the hypervisor inform that the memory should be made physically accessible. This allows for "memmap on memory" initialization during memory hotplug onlining phase, which is performed before calling MEM_GOING_ONLINE notifier. Patch 1 introduces MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers to prepare the transition of memory to and from a physically accessible state. New mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced to ensure altmap cannot be written when adding memory - before it is set online. This enhancement is crucial for implementing the "memmap on memory" feature for s390 in a subsequent patch. Patches 2 allocates vmemmap pages from self-contained memory range for s390. It allocates memory map (struct pages array) from the hotplugged memory range, rather than using system memory by passing altmap to vmemmap functions. Patch 3 removes unhandled memory notifier types on s390. Patch 4 implements MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers on s390. MEM_PREPARE_ONLINE memory notifier makes memory block physical accessible via sclp assign command. The notifier ensures self-contained memory maps are accessible and hence enabling the "memmap on memory" on s390. MEM_FINISH_OFFLINE memory notifier shifts the memory block to an inaccessible state via sclp unassign command. Patch 5 finally enables MHP_MEMMAP_ON_MEMORY on s390. This patch (of 5): Introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers to prepare the transition of memory to and from a physically accessible state. This enhancement is crucial for implementing the "memmap on memory" feature for s390 in a subsequent patch. Platforms such as x86 can support physical memory hotplug via ACPI. When there is physical memory hotplug, ACPI event leads to the memory addition with the following callchain: acpi_memory_device_add() -> acpi_memory_enable_device() -> __add_memory() After this, the hotplugged memory is physically accessible, and altmap support prepared, before the "memmap on memory" initialization in memory_block_online() is called. On s390, memory hotplug works in a different way. The available hotplug memory has to be defined upfront in the hypervisor, but it is made physically accessible only when the user sets it online via sysfs, currently in the MEM_GOING_ONLINE notifier. This is too late and "memmap on memory" initialization is performed before calling MEM_GOING_ONLINE notifier. During the memory hotplug addition phase, altmap support is prepared and during the memory onlining phase s390 requires memory to be physically accessible and then subsequently initiate the "memmap on memory" initialization process. The memory provider will handle new MEM_PREPARE_ONLINE / MEM_FINISH_OFFLINE notifications and make the memory accessible. The mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced and is relevant when used along with MHP_MEMMAP_ON_MEMORY, because the altmap cannot be written (e.g., poisoned) when adding memory -- before it is set online. This allows for adding memory with an altmap that is not currently made available by a hypervisor. When onlining that memory, the hypervisor can be instructed to make that memory accessible via the new notifiers and the onlining phase will not require any memory allocations, which is helpful in low-memory situations. All architectures ignore unknown memory notifiers. Therefore, the introduction of these new notifiers does not result in any functional modifications across architectures. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sumanth Korikkar <[email protected]> Suggested-by: Gerald Schaefer <[email protected]> Suggested-by: David Hildenbrand <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-21	mm/cma: fix placement of trace_cma_alloc_start/finish	Kalesh Singh	1	-4/+4
	The current placement of trace_cma_alloc_start/finish misses the fail cases: !cma \|\| !cma->count \|\| !cma->bitmap. trace_cma_alloc_finish is also not emitted for the failure case where bitmap_count > bitmap_maxno. Fix these missed cases by moving the start event before the failure checks and moving the finish event to the out label. Link: https://lkml.kernel.org/r/[email protected] Fixes: 7bc1aec5e287 ("mm: cma: add trace events for CMA alloc perf testing") Signed-off-by: Kalesh Singh <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Liam Mark <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-21	mm, slab: fix the comment of cpu partial list	Chengming Zhou	1	-1/+1
	The partial slabs on cpu partial list are not frozen after the commit 8cd3fa428b56 ("slub: Delay freezing of partial slabs") merged. So fix the comment. Signed-off-by: Chengming Zhou <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2024-02-21	mm, slab: remove unused object_size parameter in kmem_cache_flags()	Chengming Zhou	3	-9/+5
	We don't use the object_size parameter in kmem_cache_flags(), so just remove it. Signed-off-by: Chengming Zhou <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2024-02-21	libfs: Add simple_offset_empty()	Chuck Lever	1	-2/+2
	For simple filesystems that use directory offset mapping, rely strictly on the directory offset map to tell when a directory has no children. After this patch is applied, the emptiness test holds only the RCU read lock when the directory being tested has no children. In addition, this adds another layer of confirmation that simple_offset_add/remove() are working as expected. Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Link: https://lore.kernel.org/r/170820143463.6328.7872919188371286951.stgit@91.116.238.104.host.secureserver.net Signed-off-by: Christian Brauner <[email protected]>
2024-02-20	Merge branch 'for-6.8/cxl-cper' into for-6.8/cxl	Dan Williams	12	-57/+117
	Pick up CXL CPER notification removal for v6.8-rc6, to return in a later merge window.
2024-02-21	shmem: document how to "persist" data when using shmem_*file_setup	Christoph Hellwig	1	-0/+4
	Add a blurb that simply dirtying the folio will persist data for in-kernel shmem files. This is what most of the callers already do. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Matthew Wilcox (Oracle)" <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-02-21	shmem: export shmem_kernel_file_setup	Christoph Hellwig	1	-0/+1
	XFS wants to use this for it's internal in-memory data structures and currently duplicates the functionality. Export shmem_kernel_file_setup to allow XFS to switch over to using the proper kernel API. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Matthew Wilcox (Oracle)" <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-02-21	shmem: export shmem_get_folio	Christoph Hellwig	1	-0/+20
	Export shmem_get_folio as a slightly lower-level variant of shmem_read_folio_gfp. This will be useful for XFS xfile use cases that want to pass SGP_NOALLOC or get a locked page, which the thin shmem_read_folio_gfp wrapper can't provide. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Matthew Wilcox (Oracle)" <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-02-21	shmem: move the shmem_mapping assert into shmem_get_folio_gfp	Christoph Hellwig	1	-1/+3
	Move the check that the inode really is a shmemfs one from shmem_read_folio_gfp to shmem_get_folio_gfp given that shmem_get_folio can also be called from outside of shmem.c. Also turn it into a WARN_ON_ONCE and error return instead of BUG_ON to be less severe. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Matthew Wilcox (Oracle)" <[email protected]> Reviewed-by: "Darrick J. Wong" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-02-21	shmem: set a_ops earlier in shmem_symlink	Christoph Hellwig	1	-1/+1
	Set the a_ops in shmem_symlink before reading a folio from the mapping to prepare for asserting that shmem_get_folio is only called on shmem mappings. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Matthew Wilcox (Oracle)" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-02-21	shmem: move shmem_mapping out of line	Christoph Hellwig	1	-3/+8
	shmem_aops really should not be exported to the world. Move shmem_mapping and export it as internal for the one semi-legitimate modular user in udmabuf. This effectively reverts commit 30e6a51dbb05 ("mm/shmem.c: make shmem_mapping() inline"). which added a bogus shmem_aops non-GPL export for no reason whatsoever as there as no shmem_mapping call outside of core MM code at that point. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Matthew Wilcox (Oracle)" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-02-21	mm: move mapping_set_update out of <linux/swap.h>	Christoph Hellwig	3	-0/+14
	mapping_set_update is only used inside mm/. Move mapping_set_update to mm/internal.h and turn it into an inline function instead of a macro. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: "Matthew Wilcox (Oracle)" <[email protected]> Signed-off-by: Chandan Babu R <[email protected]>
2024-02-20	kasan: guard release_free_meta() shadow access with kasan_arch_is_ready()	Benjamin Gray	1	-0/+3
	release_free_meta() accesses the shadow directly through the path kasan_slab_free __kasan_slab_free kasan_release_object_meta release_free_meta kasan_mem_to_shadow There are no kasan_arch_is_ready() guards here, allowing an oops when the shadow is not initialized. The oops can be seen on a Power8 KVM guest. This patch adds the guard to release_free_meta(), as it's the first level that specifically requires the shadow. It is safe to put the guard at the start of this function, before the stack put: only kasan_save_free_info() can initialize the saved stack, which itself is guarded with kasan_arch_is_ready() by its caller poison_slab_object(). If the arch becomes ready before release_free_meta() then we will not observe KASAN_SLAB_FREE_META in the object's shadow, so we will not put an uninitialized stack either. Link: https://lkml.kernel.org/r/[email protected] Fixes: 63b85ac56a64 ("kasan: stop leaking stack trace handles") Signed-off-by: Benjamin Gray <[email protected]> Reviewed-by: Andrey Konovalov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Vincenzo Frascino <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-20	mm/damon/lru_sort: fix quota status loss due to online tunings	SeongJae Park	1	-7/+36
	For online parameters change, DAMON_LRU_SORT creates new schemes based on latest values of the parameters and replaces the old schemes with the new one. When creating it, the internal status of the quotas of the old schemes is not preserved. As a result, charging of the quota starts from zero after the online tuning. The data that collected to estimate the throughput of the scheme's action is also reset, and therefore the estimation should start from the scratch again. Because the throughput estimation is being used to convert the time quota to the effective size quota, this could result in temporal time quota inaccuracy. It would be recovered over time, though. In short, the quota accuracy could be temporarily degraded after online parameters update. Fix the problem by checking the case and copying the internal fields for the status. Link: https://lkml.kernel.org/r/[email protected] Fixes: 40e983cca927 ("mm/damon: introduce DAMON-based LRU-lists Sorting") Signed-off-by: SeongJae Park <[email protected]> Cc: <[email protected]> [6.0+] Signed-off-by: Andrew Morton <[email protected]>
2024-02-20	mm/damon/reclaim: fix quota stauts loss due to online tunings	SeongJae Park	1	-1/+17
	Patch series "mm/damon: fix quota status loss due to online tunings". DAMON_RECLAIM and DAMON_LRU_SORT is not preserving internal quota status when applying new user parameters, and hence could cause temporal quota accuracy degradation. Fix it by preserving the status. This patch (of 2): For online parameters change, DAMON_RECLAIM creates new scheme based on latest values of the parameters and replaces the old scheme with the new one. When creating it, the internal status of the quota of the old scheme is not preserved. As a result, charging of the quota starts from zero after the online tuning. The data that collected to estimate the throughput of the scheme's action is also reset, and therefore the estimation should start from the scratch again. Because the throughput estimation is being used to convert the time quota to the effective size quota, this could result in temporal time quota inaccuracy. It would be recovered over time, though. In short, the quota accuracy could be temporarily degraded after online parameters update. Fix the problem by checking the case and copying the internal fields for the status. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: e035c280f6df ("mm/damon/reclaim: support online inputs update") Signed-off-by: SeongJae Park <[email protected]> Cc: <[email protected]> [5.19+] Signed-off-by: Andrew Morton <[email protected]>
2024-02-20	mm/damon/sysfs-schemes: handle schemes sysfs dir removal before ↵	SeongJae Park	1	-0/+4
	commit_schemes_quota_goals 'commit_schemes_quota_goals' command handler, damos_sysfs_set_quota_scores() assumes the number of schemes sysfs directory will be same to the number of schemes of the DAMON context. The assumption is wrong since users can remove schemes sysfs directories while DAMON is running. In the case, illegal memory accesses can happen. Fix it by checking the case. Link: https://lkml.kernel.org/r/[email protected] Fixes: d91beaa505a0 ("mm/damon/sysfs-schemes: implement a command for scheme quota goals only commit") Signed-off-by: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-20	mm: memcontrol: clarify swapaccount=0 deprecation warning	Johannes Weiner	1	-3/+7
	The swapaccount deprecation warning is throwing false positives. Since we deprecated the knob and defaulted to enabling, the only reports we've been getting are from folks that set swapaccount=1. While this is a nice affirmation that always-enabling was the right choice, we certainly don't want to warn when users request the supported mode. Only warn when disabling is requested, and clarify the warning. [[email protected]: spelling: "commdandline" -> "commandline"] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: b25806dcd3d5 ("mm: memcontrol: deprecate swapaccounting=0 mode") Signed-off-by: Colin Ian King <[email protected]> Reported-by: "Jonas Schäfer" <[email protected]> Reported-by: Narcis Garcia <[email protected]> Suggested-by: Yosry Ahmed <[email protected]> Signed-off-by: Johannes Weiner <[email protected]> Reviewed-by: Yosry Ahmed <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Shakeel Butt <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-20	mm/memblock: add MEMBLOCK_RSRV_NOINIT into flagname[] array	Anshuman Khandual	1	-0/+1
	The commit 77e6c43e137c ("memblock: introduce MEMBLOCK_RSRV_NOINIT flag") skipped adding this newly introduced memblock flag into flagname[] array, thus preventing a correct memblock flags output for applicable memblock regions. Link: https://lkml.kernel.org/r/[email protected] Fixes: 77e6c43e137c ("memblock: introduce MEMBLOCK_RSRV_NOINIT flag") Signed-off-by: Anshuman Khandual <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-20	mm/zswap: invalidate duplicate entry when !zswap_enabled	Chengming Zhou	1	-1/+5
	We have to invalidate any duplicate entry even when !zswap_enabled since zswap can be disabled anytime. If the folio store success before, then got dirtied again but zswap disabled, we won't invalidate the old duplicate entry in the zswap_store(). So later lru writeback may overwrite the new data in swapfile. Link: https://lkml.kernel.org/r/[email protected] Fixes: 42c06a0e8ebe ("mm: kill frontswap") Signed-off-by: Chengming Zhou <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Yosry Ahmed <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-20	mm/swap: fix race when skipping swapcache	Kairui Song	3	-0/+38
	When skipping swapcache for SWP_SYNCHRONOUS_IO, if two or more threads swapin the same entry at the same time, they get different pages (A, B). Before one thread (T0) finishes the swapin and installs page (A) to the PTE, another thread (T1) could finish swapin of page (B), swap_free the entry, then swap out the possibly modified page reusing the same entry. It breaks the pte_same check in (T0) because PTE value is unchanged, causing ABA problem. Thread (T0) will install a stalled page (A) into the PTE and cause data corruption. One possible callstack is like this: CPU0 CPU1 ---- ---- do_swap_page() do_swap_page() with same entry <direct swapin path> <direct swapin path> <alloc page A> <alloc page B> swap_read_folio() <- read to page A swap_read_folio() <- read to page B <slow on later locks or interrupt> <finished swapin first> ... set_pte_at() swap_free() <- entry is free <write to page B, now page A stalled> <swap out page B to same swap entry> pte_same() <- Check pass, PTE seems unchanged, but page A is stalled! swap_free() <- page B content lost! set_pte_at() <- staled page A installed! And besides, for ZRAM, swap_free() allows the swap device to discard the entry content, so even if page (B) is not modified, if swap_read_folio() on CPU0 happens later than swap_free() on CPU1, it may also cause data loss. To fix this, reuse swapcache_prepare which will pin the swap entry using the cache flag, and allow only one thread to swap it in, also prevent any parallel code from putting the entry in the cache. Release the pin after PT unlocked. Racers just loop and wait since it's a rare and very short event. A schedule_timeout_uninterruptible(1) call is added to avoid repeated page faults wasting too much CPU, causing livelock or adding too much noise to perf statistics. A similar livelock issue was described in commit 029c4628b2eb ("mm: swap: get rid of livelock in swapin readahead") Reproducer: This race issue can be triggered easily using a well constructed reproducer and patched brd (with a delay in read path) [1]: With latest 6.8 mainline, race caused data loss can be observed easily: $ gcc -g -lpthread test-thread-swap-race.c && ./a.out Polulating 32MB of memory region... Keep swapping out... Starting round 0... Spawning 65536 workers... 32746 workers spawned, wait for done... Round 0: Error on 0x5aa00, expected 32746, got 32743, 3 data loss! Round 0: Error on 0x395200, expected 32746, got 32743, 3 data loss! Round 0: Error on 0x3fd000, expected 32746, got 32737, 9 data loss! Round 0 Failed, 15 data loss! This reproducer spawns multiple threads sharing the same memory region using a small swap device. Every two threads updates mapped pages one by one in opposite direction trying to create a race, with one dedicated thread keep swapping out the data out using madvise. The reproducer created a reproduce rate of about once every 5 minutes, so the race should be totally possible in production. After this patch, I ran the reproducer for over a few hundred rounds and no data loss observed. Performance overhead is minimal, microbenchmark swapin 10G from 32G zram: Before: 10934698 us After: 11157121 us Cached: 13155355 us (Dropping SWP_SYNCHRONOUS_IO flag) [[email protected]: v4] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of synchronous device") Reported-by: "Huang, Ying" <[email protected]> Closes: https://lore.kernel.org/lkml/[email protected]/ Link: https://github.com/ryncsn/emm-test-project/tree/master/swap-stress-race [1] Signed-off-by: Kairui Song <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Acked-by: Yu Zhao <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Chris Li <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Yosry Ahmed <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Barry Song <[email protected]> Cc: SeongJae Park <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-20	mm/swap_state: update zswap LRU's protection range with the folio locked	Nhat Pham	2	-8/+9
	When a folio is swapped in, the protection size of the corresponding zswap LRU is incremented, so that the zswap shrinker is more conservative with its reclaiming action. This field is embedded within the struct lruvec, so updating it requires looking up the folio's memcg and lruvec. However, currently this lookup can happen after the folio is unlocked, for instance if a new folio is allocated, and swap_read_folio() unlocks the folio before returning. In this scenario, there is no stability guarantee for the binding between a folio and its memcg and lruvec: * A folio's memcg and lruvec can be freed between the lookup and the update, leading to a UAF. * Folio migration can clear the now-unlocked folio's memcg_data, which directs the zswap LRU protection size update towards the root memcg instead of the original memcg. This was recently picked up by the syzbot thanks to a warning in the inlined folio_lruvec() call. Move the zswap LRU protection range update above the swap_read_folio() call, and only when a new page is allocated, to prevent this. [[email protected]: add VM_WARN_ON_ONCE() to zswap_folio_swapin()] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: remove unneeded if (folio) checks] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") Reported-by: [email protected] Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Nhat Pham <[email protected]> Reviewed-by: Chengming Zhou <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-20	mm/damon/core: check apply interval in damon_do_apply_schemes()	SeongJae Park	1	-4/+11
	kdamond_apply_schemes() checks apply intervals of schemes and avoid further applying any schemes if no scheme passed its apply interval. However, the following schemes applying function, damon_do_apply_schemes() iterates all schemes without the apply interval check. As a result, the shortest apply interval is applied to all schemes. Fix the problem by checking the apply interval in damon_do_apply_schemes(). Link: https://lkml.kernel.org/r/[email protected] Fixes: 42f994b71404 ("mm/damon/core: implement scheme-specific apply interval") Signed-off-by: SeongJae Park <[email protected]> Cc: <[email protected]> [6.7.x] Signed-off-by: Andrew Morton <[email protected]>
2024-02-20	mm: zswap: fix missing folio cleanup in writeback race path	Yosry Ahmed	1	-0/+2
	In zswap_writeback_entry(), after we get a folio from __read_swap_cache_async(), we grab the tree lock again to check that the swap entry was not invalidated and recycled. If it was, we delete the folio we just added to the swap cache and exit. However, __read_swap_cache_async() returns the folio locked when it is newly allocated, which is always true for this path, and the folio is ref'd. Make sure to unlock and put the folio before returning. This was discovered by code inspection, probably because this path handles a race condition that should not happen often, and the bug would not crash the system, it will only strand the folio indefinitely. Link: https://lkml.kernel.org/r/[email protected] Fixes: 04fc7816089c ("mm: fix zswap writeback race condition") Signed-off-by: Yosry Ahmed <[email protected]> Reviewed-by: Chengming Zhou <[email protected]> Acked-by: Johannes Weiner <[email protected]> Reviewed-by: Nhat Pham <[email protected]> Cc: Domenico Cerasuolo <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-20	treewide: replace or remove redundant def_bool in Kconfig files	Masahiro Yamada	1	-4/+3
	'def_bool X' is a shorthand for 'bool' plus 'default X'. 'def_bool' is redundant where 'bool' is already present, so 'def_bool X' can be replaced with 'default X', or removed if X is 'n'. Signed-off-by: Masahiro Yamada <[email protected]>
2024-02-16	x86/numa: Fix the address overlap check in numa_fill_memblks()	Alison Schofield	1	-2/+3
	numa_fill_memblks() fills in the gaps in numa_meminfo memblks over a physical address range. To do so, it first creates a list of existing memblks that overlap that address range. The issue is that it is off by one when comparing to the end of the address range, so memblks that do not overlap are selected. The impact of selecting a memblk that does not actually overlap is that an existing memblk may be filled when the expected action is to do nothing and return NUMA_NO_MEMBLK to the caller. The caller can then add a new NUMA node and memblk. Replace the broken open-coded search for address overlap with the memblock helper memblock_addrs_overlap(). Update the kernel doc and in code comments. Suggested by: "Huang, Ying" <[email protected]> Fixes: 8f012db27c95 ("x86/numa: Introduce numa_fill_memblks()") Signed-off-by: Alison Schofield <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Acked-by: Dave Hansen <[email protected]> Reviewed-by: Dan Williams <[email protected]> Link: https://lore.kernel.org/r/10a3e6109c34c21a8dd4c513cf63df63481a2b07.1705085543.git.alison.schofield@intel.com Signed-off-by: Dan Williams <[email protected]>
2024-02-12	mm/memory: Use exception ip to search exception tables	Jiaxun Yang	1	-2/+2
	On architectures with delay slot, instruction_pointer() may differ from where exception was triggered. Use exception_ip we just introduced to search exception tables to get rid of the problem. Fixes: 4bce37a68ff8 ("mips/mm: Convert to using lock_mm_and_find_vma()") Reported-by: Xi Ruoyao <[email protected]> Link: https://lore.kernel.org/r/[email protected]/ Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Jiaxun Yang <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2024-02-10	Merge tag 'mm-hotfixes-stable-2024-02-10-11-16' of ↵	Linus Torvalds	6	-36/+52
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "21 hotfixes. 12 are cc:stable and the remainder pertain to post-6.7 issues or aren't considered to be needed in earlier kernel versions" * tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits) nilfs2: fix potential bug in end_buffer_async_write mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setup nilfs2: fix hang in nilfs_lookup_dirty_data_buffers() MAINTAINERS: Leo Yan has moved mm/zswap: don't return LRU_SKIP if we have dropped lru lock fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super mailmap: switch email address for John Moon mm: zswap: fix objcg use-after-free in entry destruction mm/madvise: don't forget to leave lazy MMU mode in madvise_cold_or_pageout_pte_range() arch/arm/mm: fix major fault accounting when retrying under per-VMA lock selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_page mm: memcg: optimize parent iteration in memcg_rstat_updated() nilfs2: fix data corruption in dsync block recovery for small block sizes mm/userfaultfd: UFFDIO_MOVE implementation should use ptep_get() exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock) fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand() getrusage: use sig->stats_lock rather than lock_task_sighand() getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand() ...
2024-02-10	Merge tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linux	Linus Torvalds	2	-2/+2
	Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - Update a potentially stale firmware attribute (Maurizio) - Fixes for the recent verbose error logging (Keith, Chaitanya) - Protection information payload size fix for passthrough (Francis) - Fix for a queue freezing issue in virtblk (Yi) - blk-iocost underflow fix (Tejun) - blk-wbt task detection fix (Jan) * tag 'block-6.8-2024-02-10' of git://git.kernel.dk/linux: virtio-blk: Ensure no requests in virtqueues before deleting vqs. blk-iocost: Fix an UBSAN shift-out-of-bounds warning nvme: use ns->head->pi_size instead of t10_pi_tuple structure size nvme-core: fix comment to reflect right functions nvme: move passthrough logging attribute to head blk-wbt: Fix detection of dirty-throttled tasks nvme-host: fix the updating of the firmware version
2024-02-08	fs: super_set_uuid()	Kent Overstreet	1	-1/+3
	Some weird old filesytems have UUID-like things that we wish to expose as UUIDs, but are smaller; add a length field so that the new FS_IOC_(GET\|SET)UUID ioctls can handle them in generic code. And add a helper super_set_uuid(), for setting nonstandard length uuids. Helper is now required for the new FS_IOC_GETUUID ioctl; if super_set_uuid() hasn't been called, the ioctl won't be supported. Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2024-02-08	quota: Properly annotate i_dquot arrays with __rcu	Jan Kara	1	-1/+1
	Dquots pointed to from i_dquot arrays in inodes are protected by dquot_srcu. Annotate them as such and change .get_dquots callback to return properly annotated pointer to make sparse happy. Fixes: b9ba6f94b238 ("quota: remove dqptr_sem") Signed-off-by: Jan Kara <[email protected]>
2024-02-07	mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setup	SeongJae Park	1	-1/+1
	DAMON sysfs interface's update_schemes_tried_regions command has a timeout of two apply intervals of the DAMOS scheme. Having zero value DAMOS scheme apply interval means it will use the aggregation interval as the value. However, the timeout setup logic is mistakenly using the sampling interval insted of the aggregartion interval for the case. This could cause earlier-than-expected timeout of the command. Fix it. Link: https://lkml.kernel.org/r/[email protected] Fixes: 7d6fa31a2fd7 ("mm/damon/sysfs-schemes: add timeout for update_schemes_tried_regions") Signed-off-by: SeongJae Park <[email protected]> Cc: <[email protected]> # 6.7.x Signed-off-by: Andrew Morton <[email protected]>
2024-02-07	mm/zswap: don't return LRU_SKIP if we have dropped lru lock	Chengming Zhou	1	-3/+1
	LRU_SKIP can only be returned if we don't ever dropped lru lock, or we need to return LRU_RETRY to restart from the head of lru list. Otherwise, the iteration might continue from a cursor position that was freed while the locks were dropped. Actually we may need to introduce another LRU_STOP to really terminate the ongoing shrinking scan process, when we encounter a warm page already in the swap cache. The current list_lru implementation doesn't have this function to early break from __list_lru_walk_one. Link: https://lkml.kernel.org/r/[email protected] Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") Signed-off-by: Chengming Zhou <[email protected]> Acked-by: Johannes Weiner <[email protected]> Reviewed-by: Nhat Pham <[email protected]> Cc: Chris Li <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-07	mm: zswap: fix objcg use-after-free in entry destruction	Johannes Weiner	1	-4/+4
	In the per-memcg LRU universe, LRU removal uses entry->objcg to determine which list count needs to be decreased. Drop the objcg reference after updating the LRU, to fix a possible use-after-free. Link: https://lkml.kernel.org/r/[email protected] Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Yosry Ahmed <[email protected]> Reviewed-by: Nhat Pham <[email protected]> Reviewed-by: Chengming Zhou <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-07	mm/madvise: don't forget to leave lazy MMU mode in ↵	Sergey Senozhatsky	1	-0/+1
	madvise_cold_or_pageout_pte_range() We need to leave lazy MMU mode before unlocking. Link: https://lkml.kernel.org/r/[email protected] Fixes: b2f557a21bc8 ("mm/madvise: add cond_resched() in madvise_cold_or_pageout_pte_range()") Signed-off-by: Sergey Senozhatsky <[email protected]> Cc: Jiexun Wang <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-07	mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_page	Miaohe Lin	1	-0/+3
	When I did soft offline stress test, a machine was observed to crash with the following message: kernel BUG at include/linux/memcontrol.h:554! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 5 PID: 3837 Comm: hwpoison.sh Not tainted 6.7.0-next-20240112-00001-g8ecf3e7fb7c8-dirty #97 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 RIP: 0010:folio_memcg+0xaf/0xd0 Code: 10 5b 5d c3 cc cc cc cc 48 c7 c6 08 b1 f2 b2 48 89 ef e8 b4 c5 f8 ff 90 0f 0b 48 c7 c6 d0 b0 f2 b2 48 89 ef e8 a2 c5 f8 ff 90 <0f> 0b 48 c7 c6 08 b1 f2 b2 48 89 ef e8 90 c5 f8 ff 90 0f 0b 66 66 RSP: 0018:ffffb6c043657c98 EFLAGS: 00000296 RAX: 000000000000004b RBX: ffff932bc1d1e401 RCX: ffff933abfb5c908 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff933abfb5c900 RBP: ffffea6f04019080 R08: ffffffffb3338ce8 R09: 0000000000009ffb R10: 00000000000004dd R11: ffffffffb3308d00 R12: ffffea6f04019080 R13: ffffea6f04019080 R14: 0000000000000001 R15: ffffb6c043657da0 FS: 00007f6c60f6b740(0000) GS:ffff933abfb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000559c3bc8b980 CR3: 0000000107f1c000 CR4: 00000000000006f0 Call Trace: <TASK> split_huge_page_to_list+0x4d/0x1380 try_to_split_thp_page+0x3a/0xf0 soft_offline_page+0x1ea/0x8a0 soft_offline_page_store+0x52/0x90 kernfs_fop_write_iter+0x118/0x1b0 vfs_write+0x30b/0x430 ksys_write+0x5e/0xe0 do_syscall_64+0xb0/0x1b0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 RIP: 0033:0x7f6c60d14697 Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffe9b72b8d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f6c60d14697 RDX: 000000000000000c RSI: 0000559c3bc8b980 RDI: 0000000000000001 RBP: 0000559c3bc8b980 R08: 00007f6c60dd1460 R09: 000000007fffffff R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c R13: 00007f6c60e1a780 R14: 00007f6c60e16600 R15: 00007f6c60e15a00 The problem is that page->mapping is overloaded with slab->slab_list or slabs fields now, so slab pages could be taken as non-LRU movable pages if field slabs contains PAGE_MAPPING_MOVABLE or slab_list->prev is set to LIST_POISON2. These slab pages will be treated as thp later leading to crash in split_huge_page_to_list(). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Fixes: 130d4df57390 ("mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head") Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-07	mm: memcg: optimize parent iteration in memcg_rstat_updated()	Yosry Ahmed	1	-21/+35
	In memcg_rstat_updated(), we iterate the memcg being updated and its parents to update memcg->vmstats_percpu->stats_updates in the fast path (i.e. no atomic updates). According to my math, this is 3 memory loads (and potentially 3 cache misses) per memcg: - Load the address of memcg->vmstats_percpu. - Load vmstats_percpu->stats_updates (based on some percpu calculation). - Load the address of the parent memcg. Avoid most of the cache misses by caching a pointer from each struct memcg_vmstats_percpu to its parent on the corresponding CPU. In this case, for the first memcg we have 2 memory loads (same as above): - Load the address of memcg->vmstats_percpu. - Load vmstats_percpu->stats_updates (based on some percpu calculation). Then for each additional memcg, we need a single load to get the parent's stats_updates directly. This reduces the number of loads from O(3N) to O(2+N) -- where N is the number of memcgs we need to iterate. Additionally, stash a pointer to memcg->vmstats in each struct memcg_vmstats_percpu such that we can access the atomic counter that all CPUs fold into, memcg->vmstats->stats_updates. memcg_should_flush_stats() is changed to memcg_vmstats_needs_flush() to accept a struct memcg_vmstats pointer accordingly. In struct memcg_vmstats_percpu, make sure both pointers together with stats_updates live on the same cacheline. Finally, update mem_cgroup_alloc() to take in a parent pointer and initialize the new cache pointers on each CPU. The percpu loop in mem_cgroup_alloc() may look concerning, but there are multiple similar loops in the cgroup creation path (e.g. cgroup_rstat_init()), most of which are hidden within alloc_percpu(). According to Oliver's testing [1], this fixes multiple 30-38% regressions in vm-scalability, will-it-scale-tlb_flush2, and will-it-scale-fallocate1. This comes at a cost of 2 more pointers per CPU (<2KB on a machine with 128 CPUs). [1] https://lore.kernel.org/lkml/ZbDJsfsZt2ITyo61@xsang-OptiPlex-9020/ [[email protected]: fix struct memcg_vmstats_percpu size and alignment] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Fixes: 8d59d2214c23 ("mm: memcg: make stats flushing threshold per-memcg") Tested-by: kernel test robot <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Acked-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Greg Thelen <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-07	mm/userfaultfd: UFFDIO_MOVE implementation should use ptep_get()	Ryan Roberts	1	-7/+7
	Commit c33c794828f2 ("mm: ptep_get() conversion") converted all (non-arch) call sites to use ptep_get() instead of doing a direct dereference of the pte. Full rationale can be found in that commit's log. Since then, UFFDIO_MOVE has been implemented which does 7 direct pte dereferences. Let's fix those up to use ptep_get(). I've asserted in the past that there is no reliable automated mechanism to catch these; I'm relying on a combination of Coccinelle (which throws up a lot of false positives) and some compiler magic to force a compiler error on dereference. But given the frequency with which new issues are coming up, I'll add it to my todo list to try to find an automated solution. Link: https://lkml.kernel.org/r/[email protected] Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Signed-off-by: Ryan Roberts <[email protected]> Reviewed-by: Suren Baghdasaryan <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-06	blk-wbt: Fix detection of dirty-throttled tasks	Jan Kara	2	-2/+2
	The detection of dirty-throttled tasks in blk-wbt has been subtly broken since its beginning in 2016. Namely if we are doing cgroup writeback and the throttled task is not in the root cgroup, balance_dirty_pages() will set dirty_sleep for the non-root bdi_writeback structure. However blk-wbt checks dirty_sleep only in the root cgroup bdi_writeback structure. Thus detection of recently throttled tasks is not working in this case (we noticed this when we switched to cgroup v2 and suddently writeback was slow). Since blk-wbt has no easy way to get to proper bdi_writeback and furthermore its intention has always been to work on the whole device rather than on individual cgroups, just move the dirty_sleep timestamp from bdi_writeback to backing_dev_info. That fixes the checking for recently throttled task and saves memory for everybody as a bonus. CC: [email protected] Fixes: b57d74aff9ab ("writeback: track if we're sleeping on progress in balance_dirty_pages()") Signed-off-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] [axboe: fixup indentation errors] Signed-off-by: Jens Axboe <[email protected]>
2024-02-02	Merge branch 'master' into mm-hotfixes-stable	Andrew Morton	1	-0/+3

2024-02-01	mm/util: Introduce kmemdup_array()	Kartik	1	-0/+17
	Introduce kmemdup_array() API to duplicate `n` number of elements from a given array. This internally uses kmemdup to allocate and duplicate the `src` array. Signed-off-by: Kartik <[email protected]> Acked-by: Kees Cook <[email protected]> Signed-off-by: Thierry Reding <[email protected]>
2024-01-30	mm/slub: remove parameter 'flags' in create_kmalloc_caches()	Zheng Yejian	3	-10/+9
	After commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"), parameter 'flags' is only passed as 0 in create_kmalloc_caches(), and then it is only passed to new_kmalloc_cache(). So we can change parameter 'flags' to be a local variable with initial value 0 in new_kmalloc_cache() and remove parameter 'flags' in create_kmalloc_caches(). Also make new_kmalloc_cache() static due to it is only used in mm/slab_common.c. Signed-off-by: Zheng Yejian <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Chengming Zhou <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>