blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-04-05	sched/numa: enhance vma scanning logic	Raghavendra K T	2	-0/+15
	During Numa scanning make sure only relevant vmas of the tasks are scanned. Before: All the tasks of a process participate in scanning the vma even if they do not access vma in it's lifespan. Now: Except cases of first few unconditional scans, if a process do not touch vma (exluding false positive cases of PID collisions) tasks no longer scan all vma Logic used: 1) 6 bits of PID used to mark active bit in vma numab status during fault to remember PIDs accessing vma. (Thanks Mel) 2) Subsequently in scan path, vma scanning is skipped if current PID had not accessed vma. 3) First two times we do allow unconditional scan to preserve earlier behaviour of scanning. Acknowledgement to Bharata B Rao <[email protected]> for initial patch to store pid information and Peter Zijlstra <[email protected]> (Usage of test and set bit) Link: https://lkml.kernel.org/r/092f03105c7c1d3450f4636b1ea350407f07640e.1677672277.git.raghavendra.kt@amd.com Signed-off-by: Raghavendra K T <[email protected]> Suggested-by: Mel Gorman <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Disha Talreja <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	sched/numa: apply the scan delay to every new vma	Mel Gorman	2	-0/+23
	Pach series "sched/numa: Enhance vma scanning", v3. The patchset proposes one of the enhancements to numa vma scanning suggested by Mel. This is continuation of [3]. Reposting the rebased patchset to akpm mm-unstable tree (March 1) Existing mechanism of scan period involves, scan period derived from per-thread stats. Process Adaptive autoNUMA [1] proposed to gather NUMA fault stats at per-process level to capture aplication behaviour better. During that course of discussion, Mel proposed several ideas to enhance current numa balancing. One of the suggestion was below Track what threads access a VMA. The suggestion was to use an unsigned long pid_mask and use the lower bits to tag approximately what threads access a VMA. Skip VMAs that did not trap a fault. This would be approximate because of PID collisions but would reduce scanning of areas the thread is not interested in. The above suggestion intends not to penalize threads that has no interest in the vma, thus reduce scanning overhead. V3 changes are mostly based on PeterZ comments (details below in changes) Summary of patchset: Current patchset implements: 1. Delay the vma scanning logic for newly created VMA's so that additional overhead of scanning is not incurred for short lived tasks (implementation by Mel) 2. Store the information of tasks accessing VMA in 2 windows. It is regularly cleared in (4sysctl_numa_balancing_scan_delay) interval. The above time is derived from experimenting (Suggested by PeterZ) to balance between frequent clearing vs obsolete access data 3. hash_32 used to encode task index accessing VMA information 4. VMA's acess information is used to skip scanning for the tasks which had not accessed VMA Changes since V2: patch1: - Renaming of structure, macro to function, - Add explanation to heuristics - Adding more details from result (PeterZ) Patch2: - Usage of test and set bit (PeterZ) - Move storing access PID info to numa_migrate_prep() - Add a note on fainess among tasks allowed to scan (PeterZ) Patch3: - Maintain two windows of access PID information (PeterZ supported implementation and Gave idea to extend to N if needed) Patch4: - Apply hash_32 function to track VMA accessing PIDs (PeterZ) Changes since RFC V1: - Include Mel's vma scan delay patch - Change the accessing pid store logic (Thanks Mel) - Fencing structure / code to NUMA_BALANCING (David, Mel) - Adding clearing access PID logic (Mel) - Descriptive change log ( Mike Rapoport) Things to ponder over: ========================================== - Improvement to clearing accessing PIDs logic (discussed in-detail in patch3 itself (Done in this patchset by implementing 2 window history) - Current scan period is not changed in the patchset, so we do see frequent tries to scan. Relaxing scan period dynamically could improve results further. [1] sched/numa: Process Adaptive autoNUMA Link: https://lore.kernel.org/lkml/[email protected]/T/ [2] RFC V1 Link: https://lore.kernel.org/all/[email protected]/ [3] V2 Link: https://lore.kernel.org/lkml/[email protected]/ Results: Summary: Huge autonuma cost reduction seen in mmtest. Kernbench improvement is more than 5% and huge system time (80%+) improvement from mmtest autonuma. (dbench had huge std deviation to post) kernbench =========== 6.2.0-mmunstable-base 6.2.0-mmunstable-patched Amean user-256 22002.51 ( 0.00%) 22649.95 -2.94%* Amean syst-256 10162.78 ( 0.00%) 8214.13 * 19.17%* Amean elsp-256 160.74 ( 0.00%) 156.92 * 2.38%* Duration User 66017.43 67959.84 Duration System 30503.15 24657.03 Duration Elapsed 504.61 493.12 6.2.0-mmunstable-base 6.2.0-mmunstable-patched Ops NUMA alloc hit 1738835089.00 1738780310.00 Ops NUMA alloc local 1738834448.00 1738779711.00 Ops NUMA base-page range updates 477310.00 392566.00 Ops NUMA PTE updates 477310.00 392566.00 Ops NUMA hint faults 96817.00 87555.00 Ops NUMA hint local faults % 10150.00 2192.00 Ops NUMA hint local percent 10.48 2.50 Ops NUMA pages migrated 86660.00 85363.00 Ops AutoNUMA cost 489.07 442.14 autonumabench =============== 6.2.0-mmunstable-base 6.2.0-mmunstable-patched Amean syst-NUMA01 399.50 ( 0.00%) 52.05 * 86.97%* Amean syst-NUMA01_THREADLOCAL 0.21 ( 0.00%) 0.22 * -5.41%* Amean syst-NUMA02 0.80 ( 0.00%) 0.78 * 2.68%* Amean syst-NUMA02_SMT 0.65 ( 0.00%) 0.68 * -3.95%* Amean elsp-NUMA01 313.26 ( 0.00%) 313.11 * 0.05%* Amean elsp-NUMA01_THREADLOCAL 1.06 ( 0.00%) 1.08 * -1.76%* Amean elsp-NUMA02 3.19 ( 0.00%) 3.24 * -1.52%* Amean elsp-NUMA02_SMT 3.72 ( 0.00%) 3.61 * 2.92%* Duration User 396433.47 324835.96 Duration System 2808.70 376.66 Duration Elapsed 2258.61 2258.12 6.2.0-mmunstable-base 6.2.0-mmunstable-patched Ops NUMA alloc hit 59921806.00 49623489.00 Ops NUMA alloc miss 0.00 0.00 Ops NUMA interleave hit 0.00 0.00 Ops NUMA alloc local 59920880.00 49622594.00 Ops NUMA base-page range updates 152259275.00 50075.00 Ops NUMA PTE updates 152259275.00 50075.00 Ops NUMA PMD updates 0.00 0.00 Ops NUMA hint faults 154660352.00 39014.00 Ops NUMA hint local faults % 138550501.00 23139.00 Ops NUMA hint local percent 89.58 59.31 Ops NUMA pages migrated 8179067.00 14147.00 Ops AutoNUMA cost 774522.98 195.69 This patch (of 4): Currently whenever a new task is created we wait for sysctl_numa_balancing_scan_delay to avoid unnessary scanning overhead. Extend the same logic to new or very short-lived VMAs. [[email protected]: add initialization in vm_area_dup())] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/7a6fbba87c8b51e67efd3e74285bb4cb311a16ca.1677672277.git.raghavendra.kt@amd.com Signed-off-by: Mel Gorman <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Disha Talreja <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: separate vma->lock from vm_area_struct	Suren Baghdasaryan	2	-14/+15
	vma->lock being part of the vm_area_struct causes performance regression during page faults because during contention its count and owner fields are constantly updated and having other parts of vm_area_struct used during page fault handling next to them causes constant cache line bouncing. Fix that by moving the lock outside of the vm_area_struct. All attempts to keep vma->lock inside vm_area_struct in a separate cache line still produce performance regression especially on NUMA machines. Smallest regression was achieved when lock is placed in the fourth cache line but that bloats vm_area_struct to 256 bytes. Considering performance and memory impact, separate lock looks like the best option. It increases memory footprint of each VMA but that can be optimized later if the new size causes issues. Note that after this change vma_init() does not allocate or initialize vma->lock anymore. A number of drivers allocate a pseudo VMA on the stack but they never use the VMA's lock, therefore it does not need to be allocated. The future drivers which might need the VMA lock should use vm_area_alloc()/vm_area_free() to allocate the VMA. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm/mmap: free vm_area_struct without call_rcu in exit_mmap	Suren Baghdasaryan	1	-0/+2
	call_rcu() can take a long time when callback offloading is enabled. Its use in the vm_area_free can cause regressions in the exit path when multiple VMAs are being freed. Because exit_mmap() is called only after the last mm user drops its refcount, the page fault handlers can't be racing with it. Any other possible user like oom-reaper or process_mrelease are already synchronized using mmap_lock. Therefore exit_mmap() can free VMAs directly, without the use of call_rcu(). Expose __vm_area_free() and use it from exit_mmap() to avoid possible call_rcu() floods and performance regressions caused by it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: introduce per-VMA lock statistics	Suren Baghdasaryan	2	-0/+12
	Add a new CONFIG_PER_VMA_LOCK_STATS config option to dump extra statistics about handling page fault under VMA lock. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: add FAULT_FLAG_VMA_LOCK flag	Suren Baghdasaryan	2	-1/+4
	Add a new flag to distinguish page faults handled under protection of per-vma lock. [[email protected]: document FAULT_FLAG_VMA_LOCK flag] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Reviewed-by: Laurent Dufour <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: introduce lock_vma_under_rcu to be used from arch-specific code	Suren Baghdasaryan	1	-0/+3
	Introduce lock_vma_under_rcu function to lookup and lock a VMA during page fault handling. When VMA is not found, can't be locked or changes after being locked, the function returns NULL. The lookup is performed under RCU protection to prevent the found VMA from being destroyed before the VMA lock is acquired. VMA lock statistics are updated according to the results. For now only anonymous VMAs can be searched this way. In other cases the function returns NULL. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: introduce vma detached flag	Suren Baghdasaryan	2	-0/+14
	Per-vma locking mechanism will search for VMA under RCU protection and then after locking it, has to ensure it was not removed from the VMA tree after we found it. To make this check efficient, introduce a vma->detached flag to mark VMAs which were removed from the VMA tree. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm/khugepaged: write-lock VMA while collapsing a huge page	Suren Baghdasaryan	1	-11/+30
	Protect VMA from concurrent page fault handler while collapsing a huge page. Page fault handler needs a stable PMD to use PTL and relies on per-VMA lock to prevent concurrent PMD changes. pmdp_collapse_flush(), set_huge_pmd() and collapse_and_free_pmd() can modify a PMD, which will not be detected by a page fault handler without proper locking. Before this patch, page tables can be walked under any one of the mmap_lock, the mapping lock, and the anon_vma lock; so when khugepaged unlinks and frees page tables, it must ensure that all of those either are locked or don't exist. This patch adds a fourth lock under which page tables can be traversed, and so khugepaged must also lock out that one. [[email protected]: vm_lock/i_mmap_rwsem inversion in retract_page_tables] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: build fix] Link: https://lkml.kernel.org/r/CAJuCfpFjWhtzRE1X=J+_JjgJzNKhq-=JT8yTBSTHthwp0pqWZw@mail.gmail.com Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: mark VMA as being written when changing vm_flags	Suren Baghdasaryan	1	-5/+5
	Updates to vm_flags have to be done with VMA marked as being written for preventing concurrent page faults or other modifications. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: add per-VMA lock and helper functions to control it	Suren Baghdasaryan	3	-0/+103
	Introduce per-VMA locking. The lock implementation relies on a per-vma and per-mm sequence counters to note exclusive locking: - read lock - (implemented by vma_start_read) requires the vma (vm_lock_seq) and mm (mm_lock_seq) sequence counters to differ. If they match then there must be a vma exclusive lock held somewhere. - read unlock - (implemented by vma_end_read) is a trivial vma->lock unlock. - write lock - (vma_start_write) requires the mmap_lock to be held exclusively and the current mm counter is assigned to the vma counter. This will allow multiple vmas to be locked under a single mmap_lock write lock (e.g. during vma merging). The vma counter is modified under exclusive vma lock. - write unlock - (vma_end_write_all) is a batch release of all vma locks held. It doesn't pair with a specific vma_start_write! It is done before exclusive mmap_lock is released by incrementing mm sequence counter (mm_lock_seq). - write downgrade - if the mmap_lock is downgraded to the read lock, all vma write locks are released as well (effectivelly same as write unlock). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: move mmap_lock assert function definitions	Suren Baghdasaryan	1	-12/+12
	Move mmap_lock assert function definitions up so that they can be used by other mmap_lock routines. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: rcu safe VMA freeing	Michel Lespinasse	1	-3/+10
	This prepares for page faults handling under VMA lock, looking up VMAs under protection of an rcu read lock, instead of the usual mmap read lock. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Michel Lespinasse <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	trace: cma: remove unnecessary event class cma_alloc_class	Wenchao Hao	1	-33/+25
	After commit cb6c33d4dc09 ("cma: tracing: print alloc result in trace_cma_alloc_finish"), cma_alloc_class has only one event which is cma_alloc_busy_retry. So we can remove the cma_alloc_class. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Wenchao Hao <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> Cc: Feilong Lin <[email protected]> Cc: Hongxiang Lou <[email protected]> Cc: Masami Hiramatsu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: vmalloc: convert vread() to vread_iter()	Lorenzo Stoakes	1	-1/+2
	Having previously laid the foundation for converting vread() to an iterator function, pull the trigger and do so. This patch attempts to provide minimal refactoring and to reflect the existing logic as best we can, for example we continue to zero portions of memory not read, as before. Overall, there should be no functional difference other than a performance improvement in /proc/kcore access to vmalloc regions. Now we have eliminated the need for a bounce buffer in read_kcore_iter(), we dispense with it, and try to write to user memory optimistically but with faults disabled via copy_page_to_iter_nofault(). We already have preemption disabled by holding a spin lock. We continue faulting in until the operation is complete. Additionally, we must account for the fact that at any point a copy may fail (most likely due to a fault not being able to occur), we exit indicating fewer bytes retrieved than expected. [[email protected]: fix sparc64 warning] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: redo Stephen's sparc build fix] Link: https://lkml.kernel.org/r/8506cbc667c39205e65a323f750ff9c11a463798.1679566220.git.lstoakes@gmail.com [[email protected]: unbreak uio.h includes] Link: https://lkml.kernel.org/r/941f88bc5ab928e6656e1e2593b91bf0f8c81e1b.1679511146.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Reviewed-by: Baoquan He <[email protected]> Cc: Alexander Viro <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Liu Shixin <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	iov_iter: add copy_page_to_iter_nofault()	Lorenzo Stoakes	1	-0/+2
	Provide a means to copy a page to user space from an iterator, aborting if a page fault would occur. This supports compound pages, but may be passed a tail page with an offset extending further into the compound page, so we cannot pass a folio. This allows for this function to be called from atomic context and _try_ to user pages if they are faulted in, aborting if not. The function does not use _copy_to_iter() in order to not specify might_fault(), this is similar to copy_page_from_iter_atomic(). This is being added in order that an iteratable form of vread() can be implemented while holding spinlocks. Link: https://lkml.kernel.org/r/19734729defb0f498a76bdec1bef3ac48a3af3e8.1679511146.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Baoquan He <[email protected]> Cc: Alexander Viro <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Liu Shixin <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm/page_alloc: make deferred page init free pages in MAX_ORDER blocks	Kirill A. Shutemov	1	-0/+2
	Normal page init path frees pages during the boot in MAX_ORDER chunks, but deferred page init path does it in pageblock blocks. Change deferred page init path to work in MAX_ORDER blocks. For cases when MAX_ORDER is larger than pageblock, set migrate type to MIGRATE_MOVABLE for all pageblocks covered by the page. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: remove vmf_insert_pfn_xxx_prot() for huge page-table entries	Lorenzo Stoakes	1	-37/+2
	This functionality's sole user, the drm ttm module, removed support for it in commit 0d979509539e ("drm/ttm: remove ttm_bo_vm_insert_huge()") as the whole approach is currently unworkable without a PMD/PUD special bit and updates to GUP. Link: https://lkml.kernel.org/r/604c2ad79659d4b8a6e3e1611c6219d5d3233988.1678661628.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Cc: Christian König <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Hellström <[email protected]> Cc: Aaron Tomlin <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Peter Xu <[email protected]> Cc: "Russell King (Oracle)" <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: remove unused vmf_insert_mixed_prot()	Lorenzo Stoakes	2	-8/+1
	Patch series "Remove drm/ttm-specific mm changes". Functionality was added specifically for the DRM TTM driver to support mapping memory for VM_MIXEDMAP VMAs with customised protection flags, however this has now been rolled back as issues were found with this approach. This series removes the mm changes too, retaining some of the useful comments. This patch (of 3): The sole user of vmf_insert_mixed_prot(), the drm ttm module, stopped using this in commit f91142c62161 ("drm/ttm: nuke VM_MIXEDMAP on BO mappings v3") citing use of VM_MIXEDMAP in this case being terribly broken. Remove this now-dead code and references to it, but retain the useful description of the prot != vma->vm_page_prot case, moving it to vmf_insert_pfn_prot() instead. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/a069644388e6f1593a7020d15840e6fc9f39bcaf.1678661628.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Cc: Christian König <[email protected]> Cc: Dan Williams <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Hellström <[email protected]> Cc: Aaron Tomlin <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Peter Xu <[email protected]> Cc: "Russell King (Oracle)" <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm/memtest: add results of early memtest to /proc/meminfo	Tomas Mudrunka	1	-0/+2
	Currently the memtest results were only presented in dmesg. When running a large fleet of devices without ECC RAM it's currently not easy to do bulk monitoring for memory corruption. You have to parse dmesg, but that's a ring buffer so the error might disappear after some time. In general I do not consider dmesg to be a great API to query RAM status. In several companies I've seen such errors remain undetected and cause issues for way too long. So I think it makes sense to provide a monitoring API, so that we can safely detect and act upon them. This adds /proc/meminfo entry which can be easily used by scripts. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tomas Mudrunka <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: move vmalloc_init() declaration to mm/internal.h	Mike Rapoport (IBM)	1	-4/+0
	vmalloc_init() is called only from mm_core_init(), there is no need to declare it in include/linux/vmalloc.h Move vmalloc_init() declaration to mm/internal.h Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: move kmem_cache_init() declaration to mm/slab.h	Mike Rapoport (IBM)	1	-1/+0
	kmem_cache_init() is called only from mm_core_init(), there is no need to declare it in include/linux/slab.h Move kmem_cache_init() declaration to mm/slab.h Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: move mem_init_print_info() to mm_init.c	Mike Rapoport (IBM)	1	-1/+0
	mem_init_print_info() is only called from mm_core_init(). Move it close to the caller and make it static. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	init,mm: fold late call to page_ext_init() to page_alloc_init_late()	Mike Rapoport (IBM)	1	-2/+0
	When deferred initialization of struct pages is enabled, page_ext_init() must be called after all the deferred initialization is done, but there is no point to keep it a separate call from kernel_init_freeable() right after page_alloc_init_late(). Fold the call to page_ext_init() into page_alloc_init_late() and localize deferred_struct_pages variable. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: move init_mem_debugging_and_hardening() to mm/mm_init.c	Mike Rapoport (IBM)	1	-1/+0
	init_mem_debugging_and_hardening() is only called from mm_core_init(). Move it close to the caller, make it static and rename it to mem_debugging_and_hardening_init() for consistency with surrounding convention. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: call {ptlock,pgtable}_cache_init() directly from mm_core_init()	Mike Rapoport (IBM)	1	-6/+0
	and drop pgtable_init() as it has no real value and its name is misleading. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Sergei Shtylyov <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	init,mm: move mm_init() to mm/mm_init.c and rename it to mm_core_init()	Mike Rapoport (IBM)	1	-0/+1
	Make mm_init() a part of mm/ codebase. mm_core_init() better describes what the function does and does not clash with mm_init() in kernel/fork.c Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm/page_alloc: rename page_alloc_init() to page_alloc_init_cpuhp()	Mike Rapoport (IBM)	1	-1/+1
	The page_alloc_init() name is really misleading because all this function does is sets up CPU hotplug callbacks for the page allocator. Rename it to page_alloc_init_cpuhp() so that name will reflect what the function does. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: move most of core MM initialization to mm/mm_init.c	Mike Rapoport (IBM)	1	-5/+0
	The bulk of memory management initialization code is spread all over mm/page_alloc.c and makes navigating through page allocator functionality difficult. Move most of the functions marked __init and __meminit to mm/mm_init.c to make it better localized and allow some more spare room before mm/page_alloc.c reaches 10k lines. No functional changes. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Doug Berger <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: move get_page_from_free_area() to mm/page_alloc.c	Mike Rapoport (IBM)	1	-7/+0
	The get_page_from_free_area() helper is only used in mm/page_alloc.c so move it there to reduce noise in include/linux/mmzone.h Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: userfaultfd: add UFFDIO_CONTINUE_MODE_WP to install WP PTEs	Axel Rasmussen	2	-1/+9
	UFFDIO_COPY already has UFFDIO_COPY_MODE_WP, so when installing a new PTE to resolve a missing fault, one can install a write-protected one. This is useful when using UFFDIO_REGISTER_MODE_{MISSING,WP} in combination. This was motivated by testing HugeTLB HGM [1], and in particular its interaction with userfaultfd features. Existing userfaultfd code supports using WP and MINOR modes together (i.e. you can register an area with both enabled), but without this CONTINUE flag the combination is in practice unusable. So, add an analogous UFFDIO_CONTINUE_MODE_WP, which does the same thing as UFFDIO_COPY_MODE_WP, but for minor faults. Update the selftest to do some very basic exercising of the new flag. Update Documentation/ to describe how these flags are used (neither the COPY nor the new CONTINUE versions of this mode flag were described there before). [1]: https://patchwork.kernel.org/project/linux-mm/cover/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Acked-by: Peter Xu <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Cc: Al Viro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: userfaultfd: combine 'mode' and 'wp_copy' arguments	Axel Rasmussen	3	-24/+37
	Many userfaultfd ioctl functions take both a 'mode' and a 'wp_copy' argument. In future commits we plan to plumb the flags through to more places, so we'd be proliferating the very long argument list even further. Let's take the time to simplify the argument list. Combine the two arguments into one - and generalize, so when we add more flags in the future, it doesn't imply more function arguments. Since the modes (copy, zeropage, continue) are mutually exclusive, store them as an integer value (0, 1, 2) in the low bits. Place combine-able flag bits in the high bits. This is quite similar to an earlier patch proposed by Nadav Amit ("userfaultfd: introduce uffd_flags" [1]). The main difference is that patch only handled flags, whereas this patch also combines the "mode" argument into the same type to shorten the argument list. [1]: https://lore.kernel.org/all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Acked-by: James Houghton <[email protected]> Acked-by: Peter Xu <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Cc: Al Viro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jan Kara <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: userfaultfd: don't pass around both mm and vma	Axel Rasmussen	3	-7/+6
	Quite a few userfaultfd functions took both mm and vma pointers as arguments. Since the mm is trivially accessible via vma->vm_mm, there's no reason to pass both; it just needlessly extends the already long argument list. Get rid of the mm pointer, where possible, to shorten the argument list. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Acked-by: Peter Xu <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Cc: Al Viro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: James Houghton <[email protected]> Cc: Jan Kara <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: userfaultfd: rename functions for clarity + consistency	Axel Rasmussen	2	-24/+24
	Patch series "mm: userfaultfd: refactor and add UFFDIO_CONTINUE_MODE_WP", v5. - Commits 1-3 refactor userfaultfd ioctl code without behavior changes, with the main goal of improving consistency and reducing the number of function args. - Commit 4 adds UFFDIO_CONTINUE_MODE_WP. This patch (of 4): The basic problem is, over time we've added new userfaultfd ioctls, and we've refactored the code so functions which used to handle only one case are now re-used to deal with several cases. While this happened, we didn't bother to rename the functions. Similarly, as we added new functions, we cargo-culted pieces of the now-inconsistent naming scheme, so those functions too ended up with names that don't make a lot of sense. A key point here is, "copy" in most userfaultfd code refers specifically to UFFDIO_COPY, where we allocate a new page and copy its contents from userspace. There are many functions with "copy" in the name that don't actually do this (at least in some cases). So, rename things into a consistent scheme. The high level idea is that the call stack for userfaultfd ioctls becomes: userfaultfd_ioctl -> userfaultfd_(particular ioctl) -> mfill_atomic_(particular kind of fill operation) -> mfill_atomic /* loops over pages in range / -> mfill_atomic_pte / deals with single pages */ -> mfill_atomic_pte_(particular kind of fill operation) -> mfill_atomic_install_pte There are of course some special cases (shmem, hugetlb), but this is the general structure which all function names now adhere to. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Axel Rasmussen <[email protected]> Acked-by: Peter Xu <[email protected]> Acked-by: Mike Rapoport (IBM) <[email protected]> Cc: Al Viro <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: James Houghton <[email protected]> Cc: Jan Kara <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm, treewide: redefine MAX_ORDER sanely	Kirill A. Shutemov	5	-12/+12
	MAX_ORDER currently defined as number of orders page allocator supports: user can ask buddy allocator for page order between 0 and MAX_ORDER-1. This definition is counter-intuitive and lead to number of bugs all over the kernel. Change the definition of MAX_ORDER to be inclusive: the range of orders user can ask from buddy allocator is 0..MAX_ORDER now. [[email protected]: fix min() warning] Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box [[email protected]: fix another min_t warning] [[email protected]: fixups per Zi Yan] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix underlining in docs] Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Michael Ellerman <[email protected]> [powerpc] Cc: "Kirill A. Shutemov" <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm/uffd: UFFD_FEATURE_WP_UNPOPULATED	Peter Xu	3	-1/+38
	Patch series "mm/uffd: Add feature bit UFFD_FEATURE_WP_UNPOPULATED", v4. The new feature bit makes anonymous memory acts the same as file memory on userfaultfd-wp in that it'll also wr-protect none ptes. It can be useful in two cases: (1) Uffd-wp app that needs to wr-protect none ptes like QEMU snapshot, so pre-fault can be replaced by enabling this flag and speed up protections (2) It helps to implement async uffd-wp mode that Muhammad is working on [1] It's debatable whether this is the most ideal solution because with the new feature bit set, wr-protect none pte needs to pre-populate the pgtables to the last level (PAGE_SIZE). But it seems fine so far to service either purpose above, so we can leave optimizations for later. The series brings pte markers to anonymous memory too. There's some change in the common mm code path in the 1st patch, great to have some eye looking at it, but hopefully they're still relatively straightforward. This patch (of 2): This is a new feature that controls how uffd-wp handles none ptes. When it's set, the kernel will handle anonymous memory the same way as file memory, by allowing the user to wr-protect unpopulated ptes. File memories handles none ptes consistently by allowing wr-protecting of none ptes because of the unawareness of page cache being exist or not. For anonymous it was not as persistent because we used to assume that we don't need protections on none ptes or known zero pages. One use case of such a feature bit was VM live snapshot, where if without wr-protecting empty ptes the snapshot can contain random rubbish in the holes of the anonymous memory, which can cause misbehave of the guest when the guest OS assumes the pages should be all zeros. QEMU worked it around by pre-populate the section with reads to fill in zero page entries before starting the whole snapshot process [1]. Recently there's another need raised on using userfaultfd wr-protect for detecting dirty pages (to replace soft-dirty in some cases) [2]. In that case if without being able to wr-protect none ptes by default, the dirty info can get lost, since we cannot treat every none pte to be dirty (the current design is identify a page dirty based on uffd-wp bit being cleared). In general, we want to be able to wr-protect empty ptes too even for anonymous. This patch implements UFFD_FEATURE_WP_UNPOPULATED so that it'll make uffd-wp handling on none ptes being consistent no matter what the memory type is underneath. It doesn't have any impact on file memories so far because we already have pte markers taking care of that. So it only affects anonymous. The feature bit is by default off, so the old behavior will be maintained. Sometimes it may be wanted because the wr-protect of none ptes will contain overheads not only during UFFDIO_WRITEPROTECT (by applying pte markers to anonymous), but also on creating the pgtables to store the pte markers. So there's potentially less chance of using thp on the first fault for a none pmd or larger than a pmd. The major implementation part is teaching the whole kernel to understand pte markers even for anonymously mapped ranges, meanwhile allowing the UFFDIO_WRITEPROTECT ioctl to apply pte markers for anonymous too when the new feature bit is set. Note that even if the patch subject starts with mm/uffd, there're a few small refactors to major mm path of handling anonymous page faults. But they should be straightforward. With WP_UNPOPUATED, application like QEMU can avoid pre-read faults all the memory before wr-protect during taking a live snapshot. Quotting from Muhammad's test result here [3] based on a simple program [4]: (1) With huge page disabled echo madvise > /sys/kernel/mm/transparent_hugepage/enabled ./uffd_wp_perf Test DEFAULT: 4 Test PRE-READ: 1111453 (pre-fault 1101011) Test MADVISE: 278276 (pre-fault 266378) Test WP-UNPOPULATE: 11712 (2) With Huge page enabled echo always > /sys/kernel/mm/transparent_hugepage/enabled ./uffd_wp_perf Test DEFAULT: 4 Test PRE-READ: 22521 (pre-fault 22348) Test MADVISE: 4909 (pre-fault 4743) Test WP-UNPOPULATE: 14448 There'll be a great perf boost for no-thp case, while for thp enabled with extreme case of all-thp-zero WP_UNPOPULATED can be slower than MADVISE, but that's low possibility in reality, also the overhead was not reduced but postponed until a follow up write on any huge zero thp, so potentially it is faster by making the follow up writes slower. [1] https://lore.kernel.org/all/[email protected]/ [2] https://lore.kernel.org/all/Y+v2HJ8+3i%2FKzDBu@x1n/ [3] https://lore.kernel.org/all/[email protected]/ [4] https://github.com/xzpeter/clibs/blob/master/uffd-test/uffd-wp-perf.c [[email protected]: comment changes, oneliner fix to khugepaged] Link: https://lkml.kernel.org/r/ZB2/8jPhD3fpx5U8@x1n Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Xu <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Muhammad Usama Anjum <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Paul Gofman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: return an ERR_PTR from __filemap_get_folio	Christoph Hellwig	1	-5/+6
	Instead of returning NULL for all errors, distinguish between: - no entry found and not asked to allocated (-ENOENT) - failed to allocate memory (-ENOMEM) - would block (-EAGAIN) so that callers don't have to guess the error based on the passed in flags. Also pass through the error through the direct callers: filemap_get_folio, filemap_lock_folio filemap_grab_folio and filemap_get_incore_folio. [[email protected]: fix null-pointer deref] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/20230310043137.GA1624890@u2004 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Ryusuke Konishi <[email protected]> [nilfs2] Cc: Andreas Gruenbacher <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: remove FGP_ENTRY	Christoph Hellwig	1	-2/+1
	FGP_ENTRY is unused now, so remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Andreas Gruenbacher <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	mm: make mapping_get_entry available outside of filemap.c	Christoph Hellwig	1	-0/+1
	mapping_get_entry is useful for page cache API users that need to know about xa_value internals. Rename it and make it available in pagemap.h. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Andreas Gruenbacher <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-05	net: stmmac: add support for platform specific reset	Shenwei Wang	1	-0/+1
	This patch adds support for platform-specific reset logic in the stmmac driver. Some SoCs require a different reset mechanism than the standard dwmac IP reset. To support these platforms, a new function pointer 'fix_soc_reset' is added to the plat_stmmacenet_data structure. The stmmac_reset in hwif.h is modified to call the 'fix_soc_reset' function if it exists. This enables the driver to use the platform-specific reset logic when necessary. Signed-off-by: Shenwei Wang <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-05	mm: enable maple tree RCU mode by default	Liam R. Howlett	1	-1/+2
	Use the maple tree in RCU mode for VMA tracking. The maple tree tracks the stack and is able to update the pivot (lower/upper boundary) in-place to allow the page fault handler to write to the tree while holding just the mmap read lock. This is safe as the writes to the stack have a guard VMA which ensures there will always be a NULL in the direction of the growth and thus will only update a pivot. It is possible, but not recommended, to have VMAs that grow up/down without guard VMAs. syzbot has constructed a testcase which sets up a VMA to grow and consume the empty space. Overwriting the entire NULL entry causes the tree to be altered in a way that is not safe for concurrent readers; the readers may see a node being rewritten or one that does not match the maple state they are using. Enabling RCU mode allows the concurrent readers to see a stable node and will return the expected result. [[email protected]: we don't need to free the nodes with RCU[ Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree") Signed-off-by: Liam R. Howlett <[email protected]> Signed-off-by: Suren Baghdasaryan <[email protected]> Reported-by: [email protected] Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-06	drm/scdc-helper: Pimp SCDC debugs	Ville Syrjälä	1	-3/+4
	Include the device and connector information in the SCDC debugs. Makes it easier to figure out who did what. v2: Rely on connector->ddc (Maxime) Cc: Andrzej Hajda <[email protected]> Cc: Neil Armstrong <[email protected]> Cc: Robert Foss <[email protected]> Cc: Laurent Pinchart <[email protected]> Cc: Jonas Karlman <[email protected]> Cc: Jernej Skrabec <[email protected]> Cc: Thierry Reding <[email protected]> Cc: Emma Anholt <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Laurent Pinchart <[email protected]> Acked-by: Maxime Ripard <[email protected]> Reviewed-by: Andrzej Hajda <[email protected]> Acked-by: Thierry Reding <[email protected]>
2023-04-05	Migrate the PCIe-IDIO-24 and WS16C48 GPIO drivers	Mark Brown	1	-2/+4
	Merge series from William Breathitt Gray <[email protected]>: The regmap API supports IO port accessors so we can take advantage of regmap abstractions rather than handling access to the device registers directly in the driver. A patch to pass irq_drv_data as a parameter for struct regmap_irq_chip set_type_config() is included. This is needed by the idio_24_set_type_config() and ws16c48_set_type_config() callbacks in order to update the type configuration on their respective devices.
2023-04-05	PCI: Make pci_bus_for_each_resource() index optional	Andy Shevchenko	1	-5/+19
	Refactor pci_bus_for_each_resource() in the same way as pci_dev_for_each_resource(). This allows the index to be hidden inside the implementation so the caller can omit it when it's not used otherwise. No functional changes intended. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Krzysztof Wilczyński <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]>
2023-04-05	clk: Remove mmask and nmask fields in struct clk_fractional_divider	Christophe JAILLET	1	-2/+0
	All users of these fields have been removed. They are now computed when needed with [mn]shift and [mn]width. This shrinks the size of struct clk_fractional_divider from 72 to 56 bytes. Signed-off-by: Christophe JAILLET <[email protected]> Link: https://lore.kernel.org/r/680357e5acb338433bfc94114b65b4a4ce2c99e2.1680423909.git.christophe.jaillet@wanadoo.fr Reviewed-by: Heiko Stuebner <[email protected]> Signed-off-by: Stephen Boyd <[email protected]>
2023-04-05	ACPI: video: Add auto_detect arg to __acpi_video_get_backlight_type()	Hans de Goede	1	-2/+13
	Allow callers of __acpi_video_get_backlight_type() to pass a pointer to a bool which will get set to false if the backlight-type comes from the cmdline or a DMI quirk and set to true if auto-detection was used. And make __acpi_video_get_backlight_type() non static so that it can be called directly outside of video_detect.c . While at it turn the acpi_video_get_backlight_type() and acpi_video_backlight_use_native() wrappers into static inline functions in include/acpi/video.h, so that we need to export one less symbol. Fixes: 5aa9d943e9b6 ("ACPI: video: Don't enable fallback path for creating ACPI backlight by default") Cc: All applicable <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-04-05	nvmem: Add macro to register nvmem layout drivers	Miquel Raynal	1	-0/+6
	Provide a module_nvmem_layout_driver() macro at the end of the nvmem-provider.h header to reduce the boilerplate when registering nvmem layout drivers. Suggested-by: Srinivas Kandagatla <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Acked-by: Rafał Miłecki <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05	nvmem: core: support specifying both: cell raw data & post read lengths	Rafał Miłecki	1	-0/+2
	Callback .read_post_process() is designed to modify raw cell content before providing it to the consumer. So far we were dealing with modifications that didn't affect cell size (length). In some cases however cell content needs to be reformatted and resized. It's required e.g. to provide properly formatted MAC address in case it's stored in a non-binary format (e.g. using ASCII). There were few discussions how to optimally handle that. Following possible solutions were considered: 1. Allow .read_post_process() to realloc (resize) content buffer 2. Allow .read_post_process() to adjust (decrease) just buffer length 3. Register NVMEM cells using post-read sizes The preferred solution was the last one. The problem is that simply adjusting "bytes" in NVMEM providers would result in core code NOT passing whole raw data to .read_post_process() callbacks. It means callback functions couldn't do their job without somehow manually reading original cell content on their own. This patch deals with that by registering NVMEM cells with both lengths: raw content one and post read one. It allows: 1. Core code to read whole raw cell content 2. Callbacks to return content they want Signed-off-by: Rafał Miłecki <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05	nvmem: core: provide own priv pointer in post process callback	Michael Walle	1	-1/+4
	It doesn't make any more sense to have a opaque pointer set up by the nvmem device. Usually, the layout isn't associated with a particular nvmem device. Instead, let the caller who set the post process callback provide the priv pointer. Signed-off-by: Michael Walle <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-04-05	nvmem: cell: drop global cell_post_process	Michael Walle	1	-2/+0
	There are no users anymore for the global cell_post_process callback anymore. New users should use proper nvmem layouts. Signed-off-by: Michael Walle <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>