aboutsummaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)AuthorFilesLines
2023-02-20Merge tag 'fs.idmapped.v6.3' of ↵Linus Torvalds4-34/+43
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfs idmapping updates from Christian Brauner: - Last cycle we introduced the dedicated struct mnt_idmap type for mount idmapping and the required infrastucture in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). As promised in last cycle's pull request message this converts everything to rely on struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevant on the mount level. Especially for non-vfs developers without detailed knowledge in this area this was a potential source for bugs. This finishes the conversion. Instead of passing the plain namespace around this updates all places that currently take a pointer to a mnt_userns with a pointer to struct mnt_idmap. Now that the conversion is done all helpers down to the really low-level helpers only accept a struct mnt_idmap argument instead of two namespace arguments. Conflating mount and other idmappings will now cause the compiler to complain loudly thus eliminating the possibility of any bugs. This makes it impossible for filesystem developers to mix up mount and filesystem idmappings as they are two distinct types and require distinct helpers that cannot be used interchangeably. Everything associated with struct mnt_idmap is moved into a single separate file. With that change no code can poke around in struct mnt_idmap. It can only be interacted with through dedicated helpers. That means all filesystems are and all of the vfs is completely oblivious to the actual implementation of idmappings. We are now also able to extend struct mnt_idmap as we see fit. For example, we can decouple it completely from namespaces for users that don't require or don't want to use them at all. We can also extend the concept of idmappings so we can cover filesystem specific requirements. In combination with the vfs{g,u}id_t work we finished in v6.2 this makes this feature substantially more robust and thus difficult to implement wrong by a given filesystem and also protects the vfs. - Enable idmapped mounts for tmpfs and fulfill a longstanding request. A long-standing request from users had been to make it possible to create idmapped mounts for tmpfs. For example, to share the host's tmpfs mount between multiple sandboxes. This is a prerequisite for some advanced Kubernetes cases. Systemd also has a range of use-cases to increase service isolation. And there are more users of this. However, with all of the other work going on this was way down on the priority list but luckily someone other than ourselves picked this up. As usual the patch is tiny as all the infrastructure work had been done multiple kernel releases ago. In addition to all the tests that we already have I requested that Rodrigo add a dedicated tmpfs testsuite for idmapped mounts to xfstests. It is to be included into xfstests during the v6.3 development cycle. This should add a slew of additional tests. * tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits) shmem: support idmapped mounts for tmpfs fs: move mnt_idmap fs: port vfs{g,u}id helpers to mnt_idmap fs: port fs{g,u}id helpers to mnt_idmap fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap fs: port i_{g,u}id_{needs_}update() to mnt_idmap quota: port to mnt_idmap fs: port privilege checking helpers to mnt_idmap fs: port inode_owner_or_capable() to mnt_idmap fs: port inode_init_owner() to mnt_idmap fs: port acl to mnt_idmap fs: port xattr to mnt_idmap fs: port ->permission() to pass mnt_idmap fs: port ->fileattr_set() to pass mnt_idmap fs: port ->set_acl() to pass mnt_idmap fs: port ->get_acl() to pass mnt_idmap fs: port ->tmpfile() to pass mnt_idmap fs: port ->rename() to pass mnt_idmap fs: port ->mknod() to pass mnt_idmap fs: port ->mkdir() to pass mnt_idmap ...
2023-02-20Merge tag 'remove-get_kernel_pages-for-6.3' of ↵Linus Torvalds1-30/+0
https://git.linaro.org/people/jens.wiklander/linux-tee Pull TEE update from Jens Wiklander: "Remove get_kernel_pages() Vmalloc page support is removed from shm_get_kernel_pages() and the get_kernel_pages() call is replaced by calls to get_page(). With no remaining callers of get_kernel_pages() the function is removed" [ This looks like it's just some random 'tee' cleanup, but the bigger picture impetus for this is really to to to remove historical confusion with mixed use of kernel virtual addresses and 'struct page' pointers. Kernel virtual pointers in the vmalloc space is then particularly confusing - both for looking up a page pointer (when trying to then unify a "virtual address or page" interface) and _particularly_ when mixed with HIGHMEM support and the kmap*() family of remapping. This is particularly true with HIGHMEM getting much less test coverage with 32-bit architectures being increasingly legacy targets. So we actively wanted to remove get_kernel_pages() to make sure nobody else used it too, and thus the 'tee' part is "finally remove last user". See also commit 6647e76ab623 ("v4l2: don't fall back to follow_pfn() if pin_user_pages_fast() fails") for a totally different version of a conceptually similar "let's stop this confusion of different ways of referring to memory". - Linus ] * tag 'remove-get_kernel_pages-for-6.3' of https://git.linaro.org/people/jens.wiklander/linux-tee: mm: Remove get_kernel_pages() tee: Remove call to get_kernel_pages() tee: Remove vmalloc page support highmem: Enhance is_kmap_addr() to check kmap_local_page() mappings
2023-02-17Merge tag 'mm-hotfixes-stable-2023-02-17-15-16-2' of ↵Linus Torvalds4-4/+10
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Six hotfixes. Five are cc:stable: four for MM, one for nilfs2. Also a MAINTAINERS update" * tag 'mm-hotfixes-stable-2023-02-17-15-16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: nilfs2: fix underflow in second superblock position calculations hugetlb: check for undefined shift on 32 bit architectures mm/migrate: fix wrongly apply write bit after mkdirty on sparc64 MAINTAINERS: update FPU EMULATOR web page mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcount mm/filemap: fix page end in filemap_get_read_batch
2023-02-17mm/migrate: fix wrongly apply write bit after mkdirty on sparc64Peter Xu2-2/+6
Nick Bowler reported another sparc64 breakage after the young/dirty persistent work for page migration (per "Link:" below). That's after a similar report [2]. It turns out page migration was overlooked, and it wasn't failing before because page migration was not enabled in the initial report test environment. David proposed another way [2] to fix this from sparc64 side, but that patch didn't land somehow. Neither did I check whether there's any other arch that has similar issues. Let's fix it for now as simple as moving the write bit handling to be after dirty, like what we did before. Note: this is based on mm-unstable, because the breakage was since 6.1 and we're at a very late stage of 6.2 (-rc8), so I assume for this specific case we should target this at 6.3. [1] https://lore.kernel.org/all/[email protected]/ [2] https://lore.kernel.org/all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 2e3468778dbe ("mm: remember young/dirty bit for page migrations") Link: https://lore.kernel.org/all/CADyTPExpEqaJiMGoV+Z6xVgL50ZoMJg49B10LcZ=8eg19u34BA@mail.gmail.com/ Signed-off-by: Peter Xu <[email protected]> Reported-by: Nick Bowler <[email protected]> Acked-by: David Hildenbrand <[email protected]> Tested-by: Nick Bowler <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16mm: memcontrol: rename memcg_kmem_enabled()Roman Gushchin5-15/+15
Currently there are two kmem-related helper functions with a confusing semantics: memcg_kmem_enabled() and mem_cgroup_kmem_disabled(). The problem is that an obvious expectation memcg_kmem_enabled() == !mem_cgroup_kmem_disabled(), can be false. mem_cgroup_kmem_disabled() is similar to mem_cgroup_disabled(): it returns true only if CONFIG_MEMCG_KMEM is not set or the kmem accounting is disabled using a boot time kernel option "cgroup.memory=nokmem". It never changes the value dynamically. memcg_kmem_enabled() is different: it always returns false until the first non-root memory cgroup will get online (assuming the kernel memory accounting is enabled). It's goal is to improve the performance on systems without the cgroupfs mounted/memory controller enabled or on the systems with only the root memory cgroup. To make things more obvious and avoid potential bugs, let's rename memcg_kmem_enabled() to memcg_kmem_online(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Muchun Song <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Dennis Zhou <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16mm: percpu: fix incorrect size in pcpu_obj_full_size()Yafang Shao1-2/+4
The extra space which is used to store the obj_cgroup membership is only valid when kmemcg is enabled. The kmemcg can be disabled via the kernel parameter "cgroup.memory=nokmem" at boot time. This helper is also used in non-memcg code, for example the tracepoint, so we should fix it. It was found by code review when I was implementing bpf memory usage[1]. No real issue happens in production environment. [1]. https://lwn.net/Articles/921991/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yafang Shao <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Acked-by: Dennis Zhou <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Vasily Averin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16mm: page_alloc: call panic() when memoryless node allocation failsQi Zheng1-5/+3
In free_area_init(), we will continue to run after allocation of memoryless node pgdat fails. However, in the subsequent process (such as when initializing zonelist), the case that NODE_DATA(nid) is NULL is not handled, which will cause panic. Instead of this, it's better to call panic() directly when the memory allocation fails during system boot. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qi Zheng <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16mm: multi-gen LRU: avoid futile retriesYu Zhao1-10/+15
Recall that the per-node memcg LRU has two generations and they alternate when the last memcg (of a given node) is moved from one to the other. Each generation is also sharded into multiple bins to improve scalability. A reclaimer starts with a random bin (in the old generation) and, if it fails, it will retry, i.e., to try the rest of the bins. If a reclaimer fails with the last memcg, it should move this memcg to the young generation first, which causes the generations to alternate, and then retry. Otherwise, the retries will be futile because all other bins are empty. Link: https://lkml.kernel.org/r/[email protected] Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists") Signed-off-by: Yu Zhao <[email protected]> Reported-by: T.J. Mercier <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16migrate_pages: move THP/hugetlb migration support check to simplify codeHuang Ying1-47/+36
This is a code cleanup patch, no functionality change is expected. After the change, the line number reduces especially in the long migrate_pages_batch(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Suggested-by: Alistair Popple <[email protected]> Reviewed-by: Zi Yan <[email protected]> Cc: Yang Shi <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Xin Hao <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16migrate_pages: batch flushing TLBHuang Ying2-4/+21
The TLB flushing will cost quite some CPU cycles during the folio migration in some situations. For example, when migrate a folio of a process with multiple active threads that run on multiple CPUs. After batching the _unmap and _move in migrate_pages(), the TLB flushing can be batched easily with the existing TLB flush batching mechanism. This patch implements that. We use the following test case to test the patch. On a 2-socket Intel server, - Run pmbench memory accessing benchmark - Run `migratepages` to migrate pages of pmbench between node 0 and node 1 back and forth. With the patch, the TLB flushing IPI reduces 99.1% during the test and the number of pages migrated successfully per second increases 291.7%. Haoxin helped to test the patchset on an ARM64 server with 128 cores, 2 NUMA nodes. Test results show that the page migration performance increases up to 78%. NOTE: TLB flushing is batched only for normal folios, not for THP folios. Because the overhead of TLB flushing for THP folios is much lower than that for normal folios (about 1/512 on x86 platform). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Tested-by: Xin Hao <[email protected]> Reviewed-by: Zi Yan <[email protected]> Reviewed-by: Xin Hao <[email protected]> Cc: Yang Shi <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16migrate_pages: share more code between _unmap and _moveHuang Ying1-122/+85
This is a code cleanup patch to reduce the duplicated code between the _unmap and _move stages of migrate_pages(). No functionality change is expected. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Cc: Zi Yan <[email protected]> Cc: Yang Shi <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Xin Hao <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16migrate_pages: move migrate_folio_unmap()Huang Ying1-50/+50
Just move the position of the functions. There's no any functionality change. This is to make it easier to review the next patch via putting code near its position in the next patch. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Reviewed-by: Zi Yan <[email protected]> Cc: Yang Shi <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Xin Hao <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16migrate_pages: batch _unmap and _moveHuang Ying1-25/+189
In this patch the _unmap and _move stage of the folio migration is batched. That for, previously, it is, for each folio _unmap() _move() Now, it is, for each folio _unmap() for each folio _move() Based on this, we can batch the TLB flushing and use some hardware accelerator to copy folios between batched _unmap and batched _move stages. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Tested-by: Hyeonggon Yoo <[email protected]> Cc: Zi Yan <[email protected]> Cc: Yang Shi <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Xin Hao <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16migrate_pages: split unmap_and_move() to _unmap() and _move()Huang Ying1-41/+128
This is a preparation patch to batch the folio unmapping and moving. In this patch, unmap_and_move() is split to migrate_folio_unmap() and migrate_folio_move(). So, we can batch _unmap() and _move() in different loops later. To pass some information between unmap and move, the original unused dst->mapping and dst->private are used. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Reviewed-by: Xin Hao <[email protected]> Cc: Zi Yan <[email protected]> Cc: Yang Shi <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16migrate_pages: restrict number of pages to migrate in batchHuang Ying1-68/+106
This is a preparation patch to batch the folio unmapping and moving for non-hugetlb folios. If we had batched the folio unmapping, all folios to be migrated would be unmapped before copying the contents and flags of the folios. If the folios that were passed to migrate_pages() were too many in unit of pages, the execution of the processes would be stopped for too long time, thus too long latency. For example, migrate_pages() syscall will call migrate_pages() with all folios of a process. To avoid this possible issue, in this patch, we restrict the number of pages to be migrated to be no more than HPAGE_PMD_NR. That is, the influence is at the same level of THP migration. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Cc: Zi Yan <[email protected]> Cc: Yang Shi <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Xin Hao <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16migrate_pages: separate hugetlb folios migrationHuang Ying1-22/+119
This is a preparation patch to batch the folio unmapping and moving for the non-hugetlb folios. Based on that we can batch the TLB shootdown during the folio migration and make it possible to use some hardware accelerator for the folio copying. In this patch the hugetlb folios and non-hugetlb folios migration is separated in migrate_pages() to make it easy to change the non-hugetlb folios migration implementation. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Reviewed-by: Xin Hao <[email protected]> Cc: Zi Yan <[email protected]> Cc: Yang Shi <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16migrate_pages: organize stats with struct migrate_pages_statsHuang Ying1-26/+34
Patch series "migrate_pages(): batch TLB flushing", v5. Now, migrate_pages() migrates folios one by one, like the fake code as follows, for each folio unmap flush TLB copy restore map If multiple folios are passed to migrate_pages(), there are opportunities to batch the TLB flushing and copying. That is, we can change the code to something as follows, for each folio unmap for each folio flush TLB for each folio copy for each folio restore map The total number of TLB flushing IPI can be reduced considerably. And we may use some hardware accelerator such as DSA to accelerate the folio copying. So in this patch, we refactor the migrate_pages() implementation and implement the TLB flushing batching. Base on this, hardware accelerated folio copying can be implemented. If too many folios are passed to migrate_pages(), in the naive batched implementation, we may unmap too many folios at the same time. The possibility for a task to wait for the migrated folios to be mapped again increases. So the latency may be hurt. To deal with this issue, the max number of folios be unmapped in batch is restricted to no more than HPAGE_PMD_NR in the unit of page. That is, the influence is at the same level of THP migration. We use the following test to measure the performance impact of the patchset, On a 2-socket Intel server, - Run pmbench memory accessing benchmark - Run `migratepages` to migrate pages of pmbench between node 0 and node 1 back and forth. With the patch, the TLB flushing IPI reduces 99.1% during the test and the number of pages migrated successfully per second increases 291.7%. Xin Hao helped to test the patchset on an ARM64 server with 128 cores, 2 NUMA nodes. Test results show that the page migration performance increases up to 78%. This patch (of 9): Define struct migrate_pages_stats to organize the various statistics in migrate_pages(). This makes it easier to collect and consume the statistics in multiple functions. This will be needed in the following patches in the series. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Reviewed-by: Zi Yan <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Reviewed-by: Xin Hao <[email protected]> Cc: Yang Shi <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16lib/stacktrace, kasan, kmsan: rework extra_bits interfaceAndrey Konovalov2-4/+8
The current implementation of the extra_bits interface is confusing: passing extra_bits to __stack_depot_save makes it seem that the extra bits are somehow stored in stack depot. In reality, they are only embedded into a stack depot handle and are not used within stack depot. Drop the extra_bits argument from __stack_depot_save and instead provide a new stack_depot_set_extra_bits function (similar to the exsiting stack_depot_get_extra_bits) that saves extra bits into a stack depot handle. Update the callers of __stack_depot_save to use the new interace. This change also fixes a minor issue in the old code: __stack_depot_save does not return NULL if saving stack trace fails and extra_bits is used. Link: https://lkml.kernel.org/r/317123b5c05e2f82854fc55d8b285e0869d3cb77.1676063693.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16lib/stackdepot, mm: rename stack_depot_want_early_initAndrey Konovalov3-4/+4
Rename stack_depot_want_early_init to stack_depot_request_early_init. The old name is confusing, as it hints at returning some kind of intention of stack depot. The new name reflects that this function requests an action from stack depot instead. No functional changes. [[email protected]: update mm/kmemleak.c] Link: https://lkml.kernel.org/r/359f31bf67429a06e630b4395816a967214ef753.1676063693.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16mm/MADV_COLLAPSE: set EAGAIN on unexpected page refcountZach O'Keefe1-0/+1
During collapse, in a few places we check to see if a given small page has any unaccounted references. If the refcount on the page doesn't match our expectations, it must be there is an unknown user concurrently interested in the page, and so it's not safe to move the contents elsewhere. However, the unaccounted pins are likely an ephemeral state. In this situation, MADV_COLLAPSE returns -EINVAL when it should return -EAGAIN. This could cause userspace to conclude that the syscall failed, when it in fact could succeed by retrying. Link: https://lkml.kernel.org/r/[email protected] Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") Signed-off-by: Zach O'Keefe <[email protected]> Reported-by: Hugh Dickins <[email protected]> Acked-by: Hugh Dickins <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-16mm/filemap: fix page end in filemap_get_read_batchQian Yingjin1-2/+3
I was running traces of the read code against an RAID storage system to understand why read requests were being misaligned against the underlying RAID strips. I found that the page end offset calculation in filemap_get_read_batch() was off by one. When a read is submitted with end offset 1048575, then it calculates the end page for read of 256 when it should be 255. "last_index" is the index of the page beyond the end of the read and it should be skipped when get a batch of pages for read in @filemap_get_read_batch(). The below simple patch fixes the problem. This code was introduced in kernel 5.12. Link: https://lkml.kernel.org/r/[email protected] Fixes: cbd59c48ae2b ("mm/filemap: use head pages in generic_file_buffered_read") Signed-off-by: Qian Yingjin <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm: fix typo in __vm_enough_memory warningJakub Wilk1-1/+1
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jakub Wilk <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Cc: Kefeng Wang <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/damon/dbgfs: print DAMON debugfs interface deprecation messageSeongJae Park1-0/+19
DAMON debugfs interface has announced to be deprecated after >v5.15 LTS kernel is released. And, v6.1.y has announced to be an LTS[1]. Though the announcement was there for a while, some people might not noticed that so far. Also, some users could depend on it and have problems at movng to the alternative (DAMON sysfs interface). For such cases, warn DAMON debugfs interface deprecation with contacts to ask helps when any DAMON debugfs interface file is opened. [1] https://git.kernel.org/pub/scm/docs/kernel/website.git/commit/?id=332e9121320bc7461b2d3a79665caf153e51732c [[email protected]: split DAMON debugfs file open warning message, per Randy] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/damon/Kconfig: add DAMON debugfs interface deprecation noticeSeongJae Park1-3/+4
DAMON debugfs interface has announced to be deprecated after >v5.15 LTS kernel is released. And, v6.1.y has announced to be an LTS[1]. Though the announcement was there for a while, some people might not noticed that so far. Also, some users could depend on it and have problems at movng to the alternative (DAMON sysfs interface). For such cases, note DAMON debugfs interface as deprecated, and contacts to ask helps on the Kconfig. [1] https://git.kernel.org/pub/scm/docs/kernel/website.git/commit/?id=332e9121320bc7461b2d3a79665caf153e51732c Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/migrate: convert putback_movable_pages() to use foliosVishal Moola (Oracle)1-23/+23
Removes 6 calls to compound_head(), and replaces putback_movable_page() with putback_movable_folio() as well. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vishal Moola (Oracle) <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/migrate: convert isolate_movable_page() to use foliosVishal Moola (Oracle)1-19/+20
Removes 6 calls to compound_head() and prepares the function to take in a folio instead of page argument. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vishal Moola (Oracle) <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/migrate: add folio_movable_ops()Vishal Moola (Oracle)1-1/+1
folio_movable_ops() does the same as page_movable_ops() except uses folios instead of pages. This function will help make folio conversions in migrate.c more readable. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vishal Moola (Oracle) <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/mempolicy: convert migrate_page_add() to migrate_folio_add()Vishal Moola (Oracle)1-19/+20
Replace migrate_page_add() with migrate_folio_add(). migrate_folio_add() does the same a migrate_page_add() but takes in a folio instead of a page. This removes a couple of calls to compound_head(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vishal Moola (Oracle) <[email protected]> Reviewed-by: Yin Fengwei <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jane Chu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/mempolicy: convert queue_pages_required() to queue_folio_required()Vishal Moola (Oracle)1-6/+6
Replace queue_pages_required() with queue_folio_required(). queue_folio_required() does the same as queue_pages_required(), except takes in a folio instead of a page. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vishal Moola (Oracle) <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jane Chu <[email protected]> Cc: "Yin, Fengwei" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/mempolicy: convert queue_pages_hugetlb() to queue_folios_hugetlb()Vishal Moola (Oracle)1-11/+18
This change is in preparation for the conversion of queue_pages_required() to queue_folio_required() and migrate_page_add() to migrate_folio_add(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vishal Moola (Oracle) <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jane Chu <[email protected]> Cc: "Yin, Fengwei" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/mempolicy: convert queue_pages_pte_range() to queue_folios_pte_range()Vishal Moola (Oracle)1-14/+14
This function now operates on folios associated with ptes instead of pages. This change is in preparation for the conversion of queue_pages_required() to queue_folio_required() and migrate_page_add() to migrate_folio_add(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vishal Moola (Oracle) <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jane Chu <[email protected]> Cc: "Yin, Fengwei" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/mempolicy: convert queue_pages_pmd() to queue_folios_pmd()Vishal Moola (Oracle)1-12/+12
The function now operates on a folio instead of the page associated with a pmd. This change is in preparation for the conversion of queue_pages_required() to queue_folio_required() and migrate_page_add() to migrate_folio_add(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vishal Moola (Oracle) <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jane Chu <[email protected]> Cc: "Yin, Fengwei" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/hugetlb: convert hugetlb_wp() to take in a folioSidhartha Kumar1-16/+16
Change the pagecache_page argument of hugetlb_wp to pagecache_folio. Replaces a call to find_lock_page() with filemap_lock_folio(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reported-by: [email protected] Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/hugetlb: convert hugetlb_add_to_page_cache to take in a folioSidhartha Kumar1-5/+4
Every caller of hugetlb_add_to_page_cache() is now passing in &folio->page, change the function to take in a folio directly and clean up the call sites. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/hugetlb: convert restore_reserve_on_error to take in a folioSidhartha Kumar1-11/+10
Every caller of restore_reserve_on_error() is now passing in &folio->page, change the function to take in a folio directly and clean up the call sites. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()Sidhartha Kumar3-106/+107
Change alloc_huge_page() to alloc_hugetlb_folio() by changing all callers to handle the now folio return type of the function. In this conversion, alloc_huge_page_vma() is also changed to alloc_hugetlb_folio_vma() and hugepage_add_new_anon_rmap() is changed to take in a folio directly. Many additions of '&folio->page' are cleaned up in subsequent patches. hugetlbfs_fallocate() is also refactored to use the RCU + page_cache_next_miss() API. Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Mike Kravetz <[email protected]> Reported-by: kernel test robot <[email protected]> Signed-off-by: Sidhartha Kumar <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/hugetlb: convert putback_active_hugepage to take in a folioSidhartha Kumar2-8/+8
Convert putback_active_hugepage() to folio_putback_active_hugetlb(), this removes one user of the Huge Page macros which take in a page. The callers in migrate.c are also cleaned up by being able to directly use the src and dst folio variables. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/hugetlb: convert hugetlbfs_pagecache_present() to foliosSidhartha Kumar1-9/+7
Refactor hugetlbfs_pagecache_present() to avoid getting and dropping a refcount on a page. Use RCU and page_cache_next_miss() instead. Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Matthew Wilcox <[email protected]> Signed-off-by: Sidhartha Kumar <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: John Hubbard <[email protected]> Cc: kernel test robot <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/hugetlb: convert hugetlb_install_page to foliosSidhartha Kumar1-7/+7
Patch series "convert hugetlb fault functions to folios", v2. This series converts the hugetlb page faulting functions to operate on folios. These include hugetlb_no_page(), hugetlb_wp(), copy_hugetlb_page_range(), and hugetlb_mcopy_atomic_pte(). This patch (of 8): Change hugetlb_install_page() to hugetlb_install_folio(). This reduces one user of the Huge Page flag macros which take in a page. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/hugetlb: convert demote_free_huge_page to foliosSidhartha Kumar1-18/+17
Change demote_free_huge_page to demote_free_hugetlb_folio() and change demote_pool_huge_page() pass in a folio. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/hugetlb: convert restore_reserve_on_error() to foliosSidhartha Kumar1-13/+14
Use the hugetlb folio flag macros inside restore_reserve_on_error() and update the comments to reflect the use of folios. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/hugetlb: convert alloc_migrate_huge_page to foliosSidhartha Kumar2-10/+13
Change alloc_huge_page_nodemask() to alloc_hugetlb_folio_nodemask() and alloc_migrate_huge_page() to alloc_migrate_hugetlb_folio(). Both functions now return a folio rather than a page. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/hugetlb: increase use of folios in alloc_huge_page()Sidhartha Kumar2-23/+18
Change hugetlb_cgroup_commit_charge{,_rsvd}(), dequeue_huge_page_vma() and alloc_buddy_huge_page_with_mpol() to use folios so alloc_huge_page() is cleaned by operating on folios until its return. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/hugetlb: convert alloc_surplus_huge_page() to foliosSidhartha Kumar1-13/+14
Change alloc_surplus_huge_page() to alloc_surplus_hugetlb_folio() and update its callers. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/hugetlb: convert dequeue_hugetlb_page functions to foliosSidhartha Kumar1-26/+30
dequeue_huge_page_node_exact() is changed to dequeue_hugetlb_folio_node_ exact() and dequeue_huge_page_nodemask() is changed to dequeue_hugetlb_ folio_nodemask(). Update their callers to pass in a folio. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/hugetlb: convert __update_and_free_page() to foliosSidhartha Kumar1-6/+6
Change __update_and_free_page() to __update_and_free_hugetlb_folio() by changing its callers to pass in a folio. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/hugetlb: convert isolate_hugetlb to foliosSidhartha Kumar6-13/+13
Patch series "continue hugetlb folio conversion", v3. This series continues the conversion of core hugetlb functions to use folios. This series converts many helper funtions in the hugetlb fault path. This is in preparation for another series to convert the hugetlb fault code paths to operate on folios. This patch (of 8): Convert isolate_hugetlb() to take in a folio and convert its callers to pass a folio. Use page_folio() to convert the callers to use a folio is safe as isolate_hugetlb() operates on a head page. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sidhartha Kumar <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13mm/khugepaged: fix invalid page access in release_pte_pages()Vishal Moola (Oracle)1-4/+10
release_pte_pages() converts from a pfn to a folio by using pfn_folio(). If the pte is not mapped, pfn_folio() will result in undefined behavior which ends up causing a kernel panic[1]. Only call pfn_folio() once we have validated that the pte is both valid and mapped to fix the issue. [1] https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vishal Moola (Oracle) <[email protected]> Fixes: 9bdfeea46f49 ("mm/khugepaged: convert release_pte_pages() to use folios") Reported-by: Marek Szyprowski <[email protected]> Tested-by: Marek Szyprowski <[email protected]> Debugged-by: Alexandre Ghiti <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-02-13Merge tag 'mm-hotfixes-stable-2023-02-13-13-50' of ↵Linus Torvalds9-16/+57
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Twelve hotfixes, mostly against mm/. Five of these fixes are cc:stable" * tag 'mm-hotfixes-stable-2023-02-13-13-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: of: reserved_mem: Have kmemleak ignore dynamically allocated reserved mem scripts/gdb: fix 'lx-current' for x86 lib: parser: optimize match_NUMBER apis to use local array mm: shrinkers: fix deadlock in shrinker debugfs mm: hwpoison: support recovery from ksm_might_need_to_copy() kasan: fix Oops due to missing calls to kasan_arch_is_ready() revert "squashfs: harden sanity check in squashfs_read_xattr_id_table" fsdax: dax_unshare_iter() should return a valid length mm/gup: add folio to list when folio_isolate_lru() succeed aio: fix mremap after fork null-deref mailmap: add entry for Alexander Mikhalitsyn mm: extend max struct page size for kmsan
2023-02-13mm: Remove get_kernel_pages()Ira Weiny1-30/+0
The only caller to get_kernel_pages() [shm_get_kernel_pages()] has been updated to not need it. Remove get_kernel_pages(). Cc: Mel Gorman <[email protected]> Cc: Al Viro <[email protected]> Cc: "Fabio M. De Francesco" <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Andrew Morton <[email protected]> Acked-by: John Hubbard <[email protected]> Signed-off-by: Ira Weiny <[email protected]> Acked-by: Andrew Morton <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Sumit Garg <[email protected]> Signed-off-by: Jens Wiklander <[email protected]>