aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-07-29selftests: soft-dirty: add test for mprotectPeter Xu1-1/+66
Add two soft-dirty test cases for mprotect() on both anon or file. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Nadav Amit <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mm/mprotect: fix soft-dirty check in can_change_pte_writable()Peter Xu3-2/+20
Patch series "mm/mprotect: Fix soft-dirty checks", v4. This patch (of 3): The check wanted to make sure when soft-dirty tracking is enabled we won't grant write bit by accident, as a page fault is needed for dirty tracking. The intention is correct but we didn't check it right because VM_SOFTDIRTY set actually means soft-dirty tracking disabled. Fix it. There's another thing tricky about soft-dirty is that, we can't check the vma flag !(vma_flags & VM_SOFTDIRTY) directly but only check it after we checked CONFIG_MEM_SOFT_DIRTY because otherwise VM_SOFTDIRTY will be defined as zero, and !(vma_flags & VM_SOFTDIRTY) will constantly return true. To avoid misuse, introduce a helper for checking whether vma has soft-dirty tracking enabled. We can easily verify this with any exclusive anonymous page, like program below: =======8<====== #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <assert.h> #include <inttypes.h> #include <stdint.h> #include <sys/types.h> #include <sys/mman.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include <fcntl.h> #include <stdbool.h> #define BIT_ULL(nr) (1ULL << (nr)) #define PM_SOFT_DIRTY BIT_ULL(55) unsigned int psize; char *page; uint64_t pagemap_read_vaddr(int fd, void *vaddr) { uint64_t value; int ret; ret = pread(fd, &value, sizeof(uint64_t), ((uint64_t)vaddr >> 12) * sizeof(uint64_t)); assert(ret == sizeof(uint64_t)); return value; } void clear_refs_write(void) { int fd = open("/proc/self/clear_refs", O_RDWR); assert(fd >= 0); write(fd, "4", 2); close(fd); } #define check_soft_dirty(str, expect) do { \ bool dirty = pagemap_read_vaddr(fd, page) & PM_SOFT_DIRTY; \ if (dirty != expect) { \ printf("ERROR: %s, soft-dirty=%d (expect: %d) ", str, dirty, expect); \ exit(-1); \ } \ } while (0) int main(void) { int fd = open("/proc/self/pagemap", O_RDONLY); assert(fd >= 0); psize = getpagesize(); page = mmap(NULL, psize, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); assert(page != MAP_FAILED); *page = 1; check_soft_dirty("Just faulted in page", 1); clear_refs_write(); check_soft_dirty("Clear_refs written", 0); mprotect(page, psize, PROT_READ); check_soft_dirty("Marked RO", 0); mprotect(page, psize, PROT_READ|PROT_WRITE); check_soft_dirty("Marked RW", 0); *page = 2; check_soft_dirty("Wrote page again", 1); munmap(page, psize); close(fd); printf("Test passed. "); return 0; } =======8<====== Here we attach a Fixes to commit 64fe24a3e05e only for easy tracking, as this patch won't apply to a tree before that point. However the commit wasn't the source of problem, but instead 64e455079e1b. It's just that after 64fe24a3e05e anonymous memory will also suffer from this problem with mprotect(). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 64e455079e1b ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared") Fixes: 64fe24a3e05e ("mm/mprotect: try avoiding write faults for exclusive anonymous pages when changing protection") Signed-off-by: Peter Xu <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mm: memcontrol: fix potential oom_lock recursion deadlockTetsuo Handa1-13/+9
syzbot is reporting GFP_KERNEL allocation with oom_lock held when reporting memcg OOM [1]. If this allocation triggers the global OOM situation then the system can livelock because the GFP_KERNEL allocation with oom_lock held cannot trigger the global OOM killer because __alloc_pages_may_oom() fails to hold oom_lock. Fix this problem by removing the allocation from memory_stat_format() completely, and pass static buffer when calling from memcg OOM path. Note that the caller holding filesystem lock was the trigger for syzbot to report this locking dependency. Doing GFP_KERNEL allocation with filesystem lock held can deadlock the system even without involving OOM situation. Link: https://syzkaller.appspot.com/bug?extid=2d2aeadc6ce1e1f11d45 [1] Link: https://lkml.kernel.org/r/[email protected] Fixes: c8713d0b23123759 ("mm: memcontrol: dump memory.stat during cgroup OOM") Signed-off-by: Tetsuo Handa <[email protected]> Reported-by: syzbot <[email protected]> Suggested-by: Michal Hocko <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mm/gup.c: fix formatting in check_and_migrate_movable_page()Alistair Popple1-2/+2
Commit b05a79d4377f ("mm/gup: migrate device coherent pages when pinning instead of failing") added a badly formatted if statement. Fix it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alistair Popple <[email protected]> Reported-by: David Hildenbrand <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29xfs: fail dax mount if reflink is enabled on a partitionShiyang Ruan1-2/+4
Failure notification is not supported on partitions. So, when we mount a reflink enabled xfs on a partition with dax option, let it fail with -EINVAL code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Shiyang Ruan <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mm/memcontrol.c: remove the redundant updating of stats_flush_thresholdJiebin Sun1-1/+8
Remove the redundant updating of stats_flush_threshold. If the global var stats_flush_threshold has exceeded the trigger value for __mem_cgroup_flush_stats, further increment is unnecessary. Apply the patch and test the pts/hackbench-1.0.0 Count:4 (160 threads). Score gain: 1.95x Reduce CPU cycles in __mod_memcg_lruvec_state (44.88% -> 0.12%) CPU: ICX 8380 x 2 sockets Core number: 40 x 2 physical cores Benchmark: pts/hackbench-1.0.0 Count:4 (160 threads) Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jiebin Sun <[email protected]> Acked-by: Shakeel Butt <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Reviewed-by: Tim Chen <[email protected]> Acked-by: Muchun Song <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Amadeusz Sawiski <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29userfaultfd: don't fail on unrecognized featuresAxel Rasmussen1-4/+2
The basic interaction for setting up a userfaultfd is, userspace issues a UFFDIO_API ioctl, and passes in a set of zero or more feature flags, indicating the features they would prefer to use. Of course, different kernels may support different sets of features (depending on kernel version, kconfig options, architecture, etc). Userspace's expectations may also not match: perhaps it was built against newer kernel headers, which defined some features the kernel it's running on doesn't support. Currently, if userspace passes in a flag we don't recognize, the initialization fails and we return -EINVAL. This isn't great, though. Userspace doesn't have an obvious way to react to this; sure, one of the features I asked for was unavailable, but which one? The only option it has is to turn off things "at random" and hope something works. Instead, modify UFFDIO_API to just ignore any unrecognized feature flags. The interaction is now that the initialization will succeed, and as always we return the *subset* of feature flags that can actually be used back to userspace. Now userspace has an obvious way to react: it checks if any flags it asked for are missing. If so, it can conclude this kernel doesn't support those, and it can either resign itself to not using them, or fail with an error on its own, or whatever else. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Cc: Peter Xu <[email protected]> Cc: Axel Rasmussen <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29hugetlb_cgroup: fix wrong hugetlb cgroup numa statMiaohe Lin1-0/+1
We forget to set cft->private for numa stat file. As a result, numa stat of hstates[0] is always showed for all hstates. Encode the hstates index into cft->private to fix this issue. Link: https://lkml.kernel.org/r/[email protected] Fixes: f47761999052 ("hugetlb: add hugetlb.*.numa_stat file") Signed-off-by: Miaohe Lin <[email protected]> Acked-by: Muchun Song <[email protected]> Cc: Kees Cook <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Mina Almasry <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29selftest/vm: uninitialized variable in main()Dan Carpenter1-1/+1
Initialize "length" to zero by default. Link: https://lkml.kernel.org/r/YtZzjvHXVXMXxpXO@kili Fixes: ff712a627f72 ("selftests/vm: cleanup hugetlb file after mremap test") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Mina Almasry <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mm/cma_debug.c: align the name buffer length as struct cmaKassey Li1-1/+1
Avoids truncating the debugfs output to 16 chars. Potentially alters the userspace output, but this is a debugfs interface and there are no stability guarantees. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kassey Li <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29tools/testing/selftests/vm/hugetlb-madvise.c: silence uninitialized variable ↵Dan Carpenter1-2/+3
warning This code just reads from memory without caring about the data itself. However static checkers complain that "tmp" is never properly initialized. Initialize it to zero and change the name to "dummy" to show that we don't care about the value stored in it. Link: https://lkml.kernel.org/r/YtZ8mKJmktA2GaHB@kili Fixes: c4b6cb884011 ("selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test") Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Souptick Joarder (HPE) <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mm/mempolicy: remove unneeded out labelMiaohe Lin1-3/+1
We can use unlock label to unlock ptl and return ret directly to remove the unneeded out label and reduce the size of mempolicy.o. No functional change intended. [Before] text data bss dec hex filename 26702 3972 6168 36842 8fea mm/mempolicy.o [After] text data bss dec hex filename 26662 3972 6168 36802 8fc2 mm/mempolicy.o Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mm/page_alloc: correct the wrong cpuset file path in commentMark-PK Tsai1-1/+1
cpuset.c was moved to kernel/cgroup/ in below commit 201af4c0fab0 ("cgroup: move cgroup files under kernel/cgroup/") Correct the wrong path in comment. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mark-PK Tsai <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mm: remove unneeded PageAnon check in restore_exclusive_pte()Miaohe Lin1-1/+1
When code reaches here, the page must be !PageAnon. There's no need to check PageAnon again. Remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29tools/vm/page_owner_sort.c: adjust the indent in is_need()Yixuan Cao1-16/+16
I noticed one more indentation than necessary in is_need(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yixuan Cao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mm/shmem: support FS_IOC_[SG]ETFLAGS in tmpfsTheodore Ts'o2-1/+74
This allows userspace to set flags like FS_APPEND_FL, FS_IMMUTABLE_FL, FS_NODUMP_FL, etc., like all other standard Linux file systems. [[email protected]: fix CONFIG_TMPFS_XATTR=n warnings] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mm/damon/reclaim: fix potential memory leak in damon_reclaim_init()Jianglei Nie1-1/+3
damon_reclaim_init() allocates a memory chunk for ctx with damon_new_ctx(). When damon_select_ops() fails, ctx is not released, which will lead to a memory leak. We should release the ctx with damon_destroy_ctx() when damon_select_ops() fails to fix the memory leak. Link: https://lkml.kernel.org/r/[email protected] Fixes: 4d69c3457821 ("mm/damon/reclaim: use damon_select_ops() instead of damon_{v,p}a_set_operations()") Signed-off-by: Jianglei Nie <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mm: vmpressure: don't count proactive reclaim in vmpressureYosry Ahmed4-21/+42
memory.reclaim is a cgroup v2 interface that allows users to proactively reclaim memory from a memcg, without real memory pressure. Reclaim operations invoke vmpressure, which is used: (a) To notify userspace of reclaim efficiency in cgroup v1, and (b) As a signal for a memcg being under memory pressure for networking (see mem_cgroup_under_socket_pressure()). For (a), vmpressure notifications in v1 are not affected by this change since memory.reclaim is a v2 feature. For (b), the effects of the vmpressure signal (according to Shakeel [1]) are as follows: 1. Reducing send and receive buffers of the current socket. 2. May drop packets on the rx path. 3. May throttle current thread on the tx path. Since proactive reclaim is invoked directly by userspace, not by memory pressure, it makes sense not to throttle networking. Hence, this change makes sure that proactive reclaim caused by memory.reclaim does not trigger vmpressure. [1] https://lore.kernel.org/lkml/CALvZod68WdrXEmBpOkadhB5GPYmCXaDZzXH=yyGOCAjFRn4NDQ@mail.gmail.com/ [[email protected]: update documentation] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yosry Ahmed <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Muchun Song <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: NeilBrown <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Peter Xu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29zsmalloc: zs_malloc: return ERR_PTR on failureHui Zhu2-7/+10
zs_malloc returns 0 if it fails. zs_zpool_malloc will return -1 when zs_malloc return 0. But -1 makes the return value unclear. For example, when zswap_frontswap_store calls zs_malloc through zs_zpool_malloc, it will return -1 to its caller. The other return value is -EINVAL, -ENODEV or something else. This commit changes zs_malloc to return ERR_PTR on failure. It didn't just let zs_zpool_malloc return -ENOMEM becaue zs_malloc has two types of failure: - size is not OK return -EINVAL - memory alloc fail return -ENOMEM. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Hui Zhu <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29writeback: remove inode_to_wb_is_valid()Xiu Jianfeng1-17/+0
inode_to_wb_is_valid() is no longer used since commit fe55d563d417 ("remove inode_congested()"), remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Xiu Jianfeng <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29memblock,arm64: expand the static memblock memory tableZhou Guanghui2-5/+18
In a system(Huawei Ascend ARM64 SoC) using HBM, a multi-bit ECC error occurs, and the BIOS will mark the corresponding area (for example, 2 MB) as unusable. When the system restarts next time, these areas are not reported or reported as EFI_UNUSABLE_MEMORY. Both cases lead to an increase in the number of memblocks, whereas EFI_UNUSABLE_MEMORY leads to a larger number of memblocks. For example, if the EFI_UNUSABLE_MEMORY type is reported: ... memory[0x92] [0x0000200834a00000-0x0000200835bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x93] [0x0000200835c00000-0x0000200835dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x94] [0x0000200835e00000-0x00002008367fffff], 0x0000000000a00000 bytes on node 7 flags: 0x0 memory[0x95] [0x0000200836800000-0x00002008369fffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x96] [0x0000200836a00000-0x0000200837bfffff], 0x0000000001200000 bytes on node 7 flags: 0x0 memory[0x97] [0x0000200837c00000-0x0000200837dfffff], 0x0000000000200000 bytes on node 7 flags: 0x4 memory[0x98] [0x0000200837e00000-0x000020087fffffff], 0x0000000048200000 bytes on node 7 flags: 0x0 memory[0x99] [0x0000200880000000-0x0000200bcfffffff], 0x0000000350000000 bytes on node 6 flags: 0x0 memory[0x9a] [0x0000200bd0000000-0x0000200bd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9b] [0x0000200bd0200000-0x0000200bd07fffff], 0x0000000000600000 bytes on node 6 flags: 0x0 memory[0x9c] [0x0000200bd0800000-0x0000200bd09fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9d] [0x0000200bd0a00000-0x0000200fcfffffff], 0x00000003ff600000 bytes on node 6 flags: 0x0 memory[0x9e] [0x0000200fd0000000-0x0000200fd01fffff], 0x0000000000200000 bytes on node 6 flags: 0x4 memory[0x9f] [0x0000200fd0200000-0x0000200fffffffff], 0x000000002fe00000 bytes on node 6 flags: 0x0 ... The EFI memory map is parsed to construct the memblock arrays before the memblock arrays can be resized. As the result, memory regions beyond INIT_MEMBLOCK_REGIONS are lost. Add a new macro INIT_MEMBLOCK_MEMORY_REGIONS to replace INIT_MEMBLOCK_REGTIONS to define the size of the static memblock.memory array. Allow overriding memblock.memory array size with architecture defined INIT_MEMBLOCK_MEMORY_REGIONS and make arm64 to set INIT_MEMBLOCK_MEMORY_REGIONS to 1024 when CONFIG_EFI is enabled. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhou Guanghui <[email protected]> Acked-by: Mike Rapoport <[email protected]> Tested-by: Darren Hart <[email protected]> Acked-by: Will Deacon <[email protected]> [arm64] Reviewed-by: Anshuman Khandual <[email protected]> Cc: Xu Qiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mm: remove obsolete comment in do_fault_around()Miaohe Lin1-4/+0
Since commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map"), do_fault_around is not called with page table lock held. Cleanup the corresponding comments. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mm: compaction: include compound page count for scanning in pageblock isolationWilliam Lam1-0/+3
The number of scanned pages can be lower than the number of isolated pages when isolating mirgratable or free pageblock. The metric is being reported in trace event and also used in vmstat. some example output from trace where it shows nr_taken can be greater than nr_scanned: Produced by kernel v5.19-rc6 kcompactd0-42 [001] ..... 1210.268022: mm_compaction_isolate_migratepages: range=(0x107ae4 ~ 0x107c00) nr_scanned=265 nr_taken=255 [...] kcompactd0-42 [001] ..... 1210.268382: mm_compaction_isolate_freepages: range=(0x215800 ~ 0x215a00) nr_scanned=13 nr_taken=128 kcompactd0-42 [001] ..... 1210.268383: mm_compaction_isolate_freepages: range=(0x215600 ~ 0x215680) nr_scanned=1 nr_taken=128 mm_compaction_isolate_migratepages does not seem to have this behaviour, but for the reason of consistency, nr_scanned should also be taken care of in that side. This behaviour is confusing since currently the count for isolated pages takes account of compound page but not for the case of scanned pages. And given that the number of isolated pages(nr_taken) reported in mm_compaction_isolate_template trace event is on a single-page basis, the ambiguity when reporting the number of scanned pages can be removed by also including compound page count. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: William Lam <[email protected]> Reviewed-by: Punit Agrawal <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29selftests/vm: skip 128TBswitch on unsupported archAdam Sindelar1-4/+4
The test va_128TBswitch.c exercises a feature only supported on PPC and x86_64, but it's run on other 64-bit archs as well. Before this patch, the test did nothing and returned 0 for KSFT_PASS. This patch makes it return the KSFT codes from kselftest.h, including KSFT_SKIP when appropriate. Verified on arm64 and x86_64. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Adam Sindelar <[email protected]> Cc: David Vernet <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29selftests/vm: fix errno handling in mrelease_testAdam Sindelar1-5/+11
mrelease_test should return KSFT_SKIP when process_mrelease is not defined, but due to a perror call consuming the errno, it returns KSFT_FAIL. This patch decides the exit code before calling perror. [[email protected]: fix remaining instances of errno mishandling] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 33776141b812 ("selftests: vm: add process_mrelease tests") Signed-off-by: Adam Sindelar <[email protected]> Reviewed-by: David Vernet <[email protected]> Reviewed-by: Suren Baghdasaryan <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mm: memcontrol: do not miss MEMCG_MAX events for enforced allocationsRoman Gushchin1-0/+9
Yafang Shao reported an issue related to the accounting of bpf memory: if a bpf map is charged indirectly for memory consumed from an interrupt context and allocations are enforced, MEMCG_MAX events are not raised. It's not/less of an issue in a generic case because consequent allocations from a process context will trigger the direct reclaim and MEMCG_MAX events will be raised. However a bpf map can belong to a dying/abandoned memory cgroup, so there will be no allocations from a process context and no MEMCG_MAX events will be triggered. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Reported-by: Yafang Shao <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29filemap: minor cleanup for filemap_write_and_wait_rangeMiaohe Lin1-12/+6
Restructure the logic in filemap_write_and_wait_range to simplify the code and make it more consistent with file_write_and_wait_range. No functional change intended. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mm/mmap.c: fix missing call to vm_unacct_memory in mmap_regionMiaohe Lin1-1/+0
Since the beginning, charged is set to 0 to avoid calling vm_unacct_memory twice because vm_unacct_memory will be called by above unmap_region. But since commit 4f74d2c8e827 ("vm: remove 'nr_accounted' calculations from the unmap_vmas() interfaces"), unmap_region doesn't call vm_unacct_memory anymore. So charged shouldn't be set to 0 now otherwise the calling to paired vm_unacct_memory will be missed and leads to imbalanced account. Link: https://lkml.kernel.org/r/[email protected] Fixes: 4f74d2c8e827 ("vm: remove 'nr_accounted' calculations from the unmap_vmas() interfaces") Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29android: binder: fix lockdep check on clearing vmaLiam Howlett1-1/+8
When munmapping a vma, the mmap_lock can be degraded to a write before calling close() on the file handle. The binder close() function calls binder_alloc_set_vma() to clear the vma address, which now has a lock dep check for writing on the mmap_lock. Change the lockdep check to ensure the reading lock is held while clearing and keep the write check while writing. Link: https://lkml.kernel.org/r/[email protected] Fixes: 472a68df605b ("android: binder: stop saving a pointer to the VMA") Signed-off-by: Liam R. Howlett <[email protected]> Reported-by: [email protected] Acked-by: Todd Kjos <[email protected]> Cc: "Arve Hjønnevåg" <[email protected]> Cc: Christian Brauner (Microsoft) <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hridya Valsaraju <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Martijn Coenen <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29android: binder: stop saving a pointer to the VMALiam R. Howlett3-18/+16
Do not record a pointer to a VMA outside of the mmap_lock for later use. This is unsafe and there are a number of failure paths *after* the recorded VMA pointer may be freed during setup. There is no callback to the driver to clear the saved pointer from generic mm code. Furthermore, the VMA pointer may become stale if any number of VMA operations end up freeing the VMA so saving it was fragile to being with. Instead, change the binder_alloc struct to record the start address of the VMA and use vma_lookup() to get the vma when needed. Add lockdep mmap_lock checks on updates to the vma pointer to ensure the lock is held and depend on that lock for synchronization of readers and writers - which was already the case anyways, so the smp_wmb()/smp_rmb() was not necessary. [[email protected]: fix drivers/android/binder_alloc_selftest.c] Link: https://lkml.kernel.org/r/20220621140212.vpkio64idahetbyf@revolver Fixes: da1b9564e85b ("android: binder: fix the race mmap and alloc_new_buf_locked") Reported-by: [email protected] Signed-off-by: Liam R. Howlett <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Christian Brauner (Microsoft) <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hridya Valsaraju <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Martijn Coenen <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Todd Kjos <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mips: rename mt_init to mips_mt_initLiam R. Howlett1-2/+2
Move mt_init out of the way for the maple tree. Use mips_mt prefix to match the rest of the functions in the file. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liam R. Howlett <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Howells <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-29mm: shrinkers: fix double kfree on shrinker nameTetsuo Handa2-2/+8
syzbot is reporting double kfree() at free_prealloced_shrinker() [1], for destroy_unused_super() calls free_prealloced_shrinker() even if prealloc_shrinker() returned an error. Explicitly clear shrinker name when prealloc_shrinker() called kfree(). [[email protected]: zero shrinker->name in all cases where shrinker->name is freed] Link: https://lkml.kernel.org/r/YtgteTnQTgyuKUSY@castle Link: https://syzkaller.appspot.com/bug?extid=8b481578352d4637f510 [1] Link: https://lkml.kernel.org/r/[email protected] Fixes: e33c267ab70de424 ("mm: shrinkers: provide shrinkers with names") Reported-by: syzbot <[email protected]> Signed-off-by: Tetsuo Handa <[email protected]> Acked-by: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-26selftests/vm: fix va_128TBswitch.sh permissionsAdam Sindelar1-0/+0
Restore the +x bit to va_128TBswitch.sh, which got dropped from the previous patch, somehow. Link: https://lkml.kernel.org/r/[email protected] Fixes: 1afd01d43efc3 ("selftests/vm: Only run 128TBswitch with 5-level paging") Signed-off-by: Adam Sindelar <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17zram: fix unused 'zram_wb_devops' warningKefeng Wang1-0/+2
drivers/block/zram/zram_drv.c:55:45: warning: 'zram_wb_devops' defined but not used [-Wunused-const-variable=] Fix the above warning if CONFIG_ZRAM_WRITEBACK not enabled. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17writeback: cleanup bdi_sched_wait()Xiu Jianfeng1-6/+0
bdi_sched_wait() is no longer used since commit 839a8e8660b6 ("writeback: replace custom worker pool implementation with unbound workqueue"), so remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Xiu Jianfeng <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Acked-by: Jens Axboe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/mmap: fix obsolete comment of find_extend_vmaMiaohe Lin1-1/+0
mmget_still_valid() has already been removed via commit 4d45e75a9955 ("mm: remove the now-unnecessary mmget_still_valid() hack"). Update the corresponding comment. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/page_vma_mapped.c: use helper function huge_pte_lockMiaohe Lin1-2/+1
Use helper function huge_pte_lock() to lock the huge pte to simplify the code a bit. No functional change intended. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/page_alloc: use try_cmpxchg in set_pfnblock_flags_maskUros Bizjak1-7/+3
Use try_cmpxchg instead of cmpxchg in set_pfnblock_flags_mask. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). The main loop improves from: 1c5d: 48 89 c2 mov %rax,%rdx 1c60: 48 89 c1 mov %rax,%rcx 1c63: 48 21 fa and %rdi,%rdx 1c66: 4c 09 c2 or %r8,%rdx 1c69: f0 48 0f b1 16 lock cmpxchg %rdx,(%rsi) 1c6e: 48 39 c1 cmp %rax,%rcx 1c71: 75 ea jne 1c5d <...> to: 1c60: 48 89 ca mov %rcx,%rdx 1c63: 48 21 c2 and %rax,%rdx 1c66: 4c 09 c2 or %r8,%rdx 1c69: f0 48 0f b1 16 lock cmpxchg %rdx,(%rsi) 1c6e: 75 f0 jne 1c60 <...> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Uros Bizjak <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm, hugetlb: skip irrelevant nodes in show_free_areas()Gang Li3-14/+16
show_free_areas() allows to filter out node specific data which is irrelevant to the allocation request. But hugetlb_show_meminfo() still shows hugetlb on all nodes, which is redundant and unnecessary. Use show_mem_node_skip() to skip irrelevant nodes. And replace hugetlb_show_meminfo() with hugetlb_show_meminfo_node(nid). before-and-after sample output of OOM: before: ``` [ 214.362453] Node 1 active_anon:148kB inactive_anon:4050920kB active_file:112kB inactive_file:100kB [ 214.375429] Node 1 Normal free:45100kB boost:0kB min:45576kB low:56968kB high:68360kB reserved_hig [ 214.388334] lowmem_reserve[]: 0 0 0 0 0 [ 214.390251] Node 1 Normal: 423*4kB (UE) 320*8kB (UME) 187*16kB (UE) 117*32kB (UE) 57*64kB (UME) 20 [ 214.397626] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 214.401518] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB ``` after: ``` [ 145.069705] Node 1 active_anon:128kB inactive_anon:4049412kB active_file:56kB inactive_file:84kB u [ 145.110319] Node 1 Normal free:45424kB boost:0kB min:45576kB low:56968kB high:68360kB reserved_hig [ 145.152315] lowmem_reserve[]: 0 0 0 0 0 [ 145.155244] Node 1 Normal: 470*4kB (UME) 373*8kB (UME) 247*16kB (UME) 168*32kB (UE) 86*64kB (UME) [ 145.164119] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB ``` Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Gang Li <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm: percpu: use kmemleak_ignore_phys() instead of kmemleak_free()Patrick Wang1-3/+3
Kmemleak recently added a rbtree to store the objects allocted with physical address. Those objects can't be freed with kmemleak_free(). According to the comments, percpu allocations are tracked by kmemleak separately. Kmemleak_free() was used to avoid the unnecessary tracking. If kmemleak_free() fails, those objects would be scanned by kmemleak, which is unnecessary but shouldn't lead to other effects. Use kmemleak_ignore_phys() instead of kmemleak_free() for those objects. Link: https://lkml.kernel.org/r/[email protected] Fixes: 0c24e061196c ("mm: kmemleak: add rbtree and store physical address for objects allocated with PA") Signed-off-by: Patrick Wang <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Catalin Marinas <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/mprotect: remove the redundant initialization for errorXiu Jianfeng1-1/+1
The variable error will be assigned correctly before it is used, the initialization is redundant, so remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Xiu Jianfeng <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/huge_memory: use helper macro IS_ERR_OR_NULL in split_huge_pages_pidMiaohe Lin1-3/+1
Use helper macro IS_ERR_OR_NULL to check the validity of page to simplify the code. Minor readability improvement. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/huge_memory: comment the subtly logic in __split_huge_pmdMiaohe Lin1-0/+4
It's dangerous and wrong to call page_folio(pmd_page(*pmd)) when pmd isn't present. But the caller guarantees pmd is present when folio is set. So we should be safe here. Add comment to make it clear. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/huge_memory: correct comment of prep_transhuge_pageMiaohe Lin1-1/+1
We use page->mapping and page->index, instead of page->indexlru in second tail page as list_head. Correct it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/huge_memory: fix comment of page_deferred_listMiaohe Lin1-2/+2
The current comment is confusing because if global or memcg deferred list in the second tail page is occupied by compound_head, why we still use page[2].deferred_list here? I think it wants to say that Global or memcg deferred list in the first tail page is occupied by compound_mapcount and compound_pincount so we use the second tail page's deferred_list instead. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/huge_memory: minor cleanup for split_huge_pages_allMiaohe Lin1-1/+6
There is nothing to do if a zone doesn't have any pages managed by the buddy allocator. So we should check managed_zone instead. Also if a thp is found, there's no need to traverse the subpages again. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/huge_memory: try to free subpage in swapcache when possibleMiaohe Lin1-1/+1
Subpages in swapcache won't be freed even if it is the last user of the page until next time reclaim. It shouldn't hurt indeed, but we could try to free these pages to save more memory for system. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/huge_memory: check pmd_present first in is_huge_zero_pmdMiaohe Lin1-1/+1
When pmd is non-present, pmd_pfn returns an insane value. So we should check pmd_present first to avoid acquiring such insane value and also avoid touching possible cold huge_zero_pfn cache line when pmd isn't present. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/huge_memory: fix comment in zap_huge_pudMiaohe Lin1-6/+1
The comment about deposited pgtable is borrowed from zap_huge_pmd but there's no deposited pgtable stuff for huge pud in zap_huge_pud. Remove it to avoid confusion. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-07-17mm/huge_memory: use helper macro __ATTR_RWMiaohe Lin1-6/+4
Use helper macro __ATTR_RW to define use_zero_page_attr, defrag_attr and enabled_attr to make code more clear. Minor readability improvement. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zach O'Keefe <[email protected]> Signed-off-by: Andrew Morton <[email protected]>