blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2021-11-06	mm/damon/dbgfs: support DAMON-based Operation Schemes	SeongJae Park	1	-3/+162
	This makes 'damon-dbgfs' to support the data access monitoring oriented memory management schemes. Users can read and update the schemes using ``<debugfs>/damon/schemes`` file. The format is:: <min/max size> <min/max access frequency> <min/max age> <action> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Cc: Amit Shah <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rienjes <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Leonard Foerster <[email protected]> Cc: Marco Elver <[email protected]> Cc: Markus Boehme <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/damon/vaddr: support DAMON-based Operation Schemes	SeongJae Park	2	-0/+58
	This makes DAMON's default primitives for virtual address spaces to support DAMON-based Operation Schemes (DAMOS) by implementing actions application functions and registering it to the monitoring context. The implementation simply links 'madvise()' for related DAMOS actions. That is, 'madvise(MADV_WILLNEED)' is called for 'WILLNEED' DAMOS action and similar for other actions ('COLD', 'PAGEOUT', 'HUGEPAGE', 'NOHUGEPAGE'). So, the kernel space DAMON users can now use the DAMON-based optimizations with only small amount of code. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Cc: Amit Shah <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rienjes <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Leonard Foerster <[email protected]> Cc: Marco Elver <[email protected]> Cc: Markus Boehme <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/damon/core: implement DAMON-based Operation Schemes (DAMOS)	SeongJae Park	2	-0/+175
	In many cases, users might use DAMON for simple data access aware memory management optimizations such as applying an operation scheme to a memory region of a specific size having a specific access frequency for a specific time. For example, "page out a memory region larger than 100 MiB but having a low access frequency more than 10 minutes", or "Use THP for a memory region larger than 2 MiB having a high access frequency for more than 2 seconds". Most simple form of the solution would be doing offline data access pattern profiling using DAMON and modifying the application source code or system configuration based on the profiling results. Or, developing a daemon constructed with two modules (one for access monitoring and the other for applying memory management actions via mlock(), madvise(), sysctl, etc) is imaginable. To avoid users spending their time for implementation of such simple data access monitoring-based operation schemes, this makes DAMON to handle such schemes directly. With this change, users can simply specify their desired schemes to DAMON. Then, DAMON will automatically apply the schemes to the user-specified target processes. Each of the schemes is composed with conditions for filtering of the target memory regions and desired memory management action for the target. Specifically, the format is:: <min/max size> <min/max access frequency> <min/max age> <action> The filtering conditions are size of memory region, number of accesses to the region monitored by DAMON, and the age of the region. The age of region is incremented periodically but reset when its addresses or access frequency has significantly changed or the action of a scheme was applied. For the action, current implementation supports a few of madvise()-like hints, ``WILLNEED``, ``COLD``, ``PAGEOUT``, ``HUGEPAGE``, and ``NOHUGEPAGE``. Because DAMON supports various address spaces and application of the actions to a monitoring target region is dependent to the type of the target address space, the application code should be implemented by each primitives and registered to the framework. Note that this only implements the framework part. Following commit will implement the action applications for virtual address spaces primitives. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Cc: Amit Shah <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Rienjes <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Leonard Foerster <[email protected]> Cc: Marco Elver <[email protected]> Cc: Markus Boehme <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/damon/core: account age of target regions	SeongJae Park	2	-0/+23
	Patch series "Implement Data Access Monitoring-based Memory Operation Schemes". Introduction ============ DAMON[1] can be used as a primitive for data access aware memory management optimizations. For that, users who want such optimizations should run DAMON, read the monitoring results, analyze it, plan a new memory management scheme, and apply the new scheme by themselves. Such efforts will be inevitable for some complicated optimizations. However, in many other cases, the users would simply want the system to apply a memory management action to a memory region of a specific size having a specific access frequency for a specific time. For example, "page out a memory region larger than 100 MiB keeping only rare accesses more than 2 minutes", or "Do not use THP for a memory region larger than 2 MiB rarely accessed for more than 1 seconds". To make the works easier and non-redundant, this patchset implements a new feature of DAMON, which is called Data Access Monitoring-based Operation Schemes (DAMOS). Using the feature, users can describe the normal schemes in a simple way and ask DAMON to execute those on its own. [1] https://damonitor.github.io Evaluations =========== DAMOS is accurate and useful for memory management optimizations. An experimental DAMON-based operation scheme for THP, 'ethp', removes 76.15% of THP memory overheads while preserving 51.25% of THP speedup. Another experimental DAMON-based 'proactive reclamation' implementation, 'prcl', reduces 93.38% of residential sets and 23.63% of system memory footprint while incurring only 1.22% runtime overhead in the best case (parsec3/freqmine). NOTE that the experimental THP optimization and proactive reclamation are not for production but only for proof of concepts. Please refer to the showcase web site's evaluation document[1] for detailed evaluation setup and results. [1] https://damonitor.github.io/doc/html/v34/vm/damon/eval.html Long-term Support Trees ----------------------- For people who want to test DAMON but using LTS kernels, there are another couple of trees based on two latest LTS kernels respectively and containing the 'damon/master' backports. - For v5.4.y: https://git.kernel.org/sj/h/damon/for-v5.4.y - For v5.10.y: https://git.kernel.org/sj/h/damon/for-v5.10.y Sequence Of Patches =================== The 1st patch accounts age of each region. The 2nd patch implements the core of the DAMON-based operation schemes feature. The 3rd patch makes the default monitoring primitives for virtual address spaces to support the schemes. From this point, the kernel space users can use DAMOS. The 4th patch exports the feature to the user space via the debugfs interface. The 5th patch implements schemes statistics feature for easier tuning of the schemes and runtime access pattern analysis, and the 6th patch adds selftests for these changes. Finally, the 7th patch documents this new feature. This patch (of 7): DAMON can be used for data access pattern aware memory management optimizations. For that, users should run DAMON, read the monitoring results, analyze it, plan a new memory management scheme, and apply the new scheme by themselves. It would not be too hard, but still require some level of effort. For complicated cases, this effort is inevitable. That said, in many cases, users would simply want to apply an actions to a memory region of a specific size having a specific access frequency for a specific time. For example, "page out a memory region larger than 100 MiB but having a low access frequency more than 10 minutes", or "Use THP for a memory region larger than 2 MiB having a high access frequency for more than 2 seconds". For such optimizations, users will need to first account the age of each region themselves. To reduce such efforts, this implements a simple age account of each region in DAMON. For each aggregation step, DAMON compares the access frequency with that from last aggregation and reset the age of the region if the change is significant. Else, the age is incremented. Also, in case of the merge of regions, the region size-weighted average of the ages is set as the age of merged new region. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Cc: Jonathan Cameron <[email protected]> Cc: Amit Shah <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Marco Elver <[email protected]> Cc: Leonard Foerster <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Markus Boehme <[email protected]> Cc: David Rienjes <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/damon/core: nullify pointer ctx->kdamond with a NULL	Colin Ian King	1	-1/+1
	Currently a plain integer is being used to nullify the pointer ctx->kdamond. Use NULL instead. Cleans up sparse warning: mm/damon/core.c:317:40: warning: Using plain integer as NULL pointer Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/damon: needn't hold kdamond_lock to print pid of kdamond	Changbin Du	1	-4/+2
	Just get the pid by 'current->pid'. Meanwhile, to be symmetrical make the 'starts' and 'finishes' logs both use debug level. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Changbin Du <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/damon: remove unnecessary do_exit() from kdamond	Changbin Du	1	-1/+1
	Just return from the kthread function. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Changbin Du <[email protected]> Cc: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/damon/core: print kdamond start log in debug mode only	SeongJae Park	1	-1/+1
	Logging of kdamond startup is using 'pr_info()' unnecessarily. This makes it to use 'pr_debug()' instead. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	include/linux/damon.h: fix kernel-doc comments for 'damon_callback'	SeongJae Park	1	-3/+3
	A few Kernel-doc comments in 'damon.h' are broken. This fixes them. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	docs/vm/damon: remove broken reference	SeongJae Park	1	-1/+0
	Building DAMON documents warns for a reference to nonexisting doc, as below: $ time make htmldocs [...] Documentation/vm/damon/index.rst:24: WARNING: toctree contains reference to nonexisting document 'vm/damon/plans' This fixes the warning by removing the wrong reference. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	MAINTAINERS: update SeongJae's email address	SeongJae Park	1	-1/+1
	This updates SeongJae's email address in MAINTAINERS file to his preferred one. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	Documentation/vm: move user guides to admin-guide/mm/	SeongJae Park	4	-21/+7
	Most memory management user guide documents are in 'admin-guide/mm/', but two of those are in 'vm/'. This moves the two docs into 'admin-guide/mm' for easier documents finding. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/damon: grammar s/works/work/	Geert Uytterhoeven	1	-1/+1
	Correct a singular versus plural grammar mistake in the help text for the DAMON_VADDR config symbol. Link: https://lkml.kernel.org/r/[email protected] Fixes: 3f49584b262cf8f4 ("mm/damon: implement primitives for the virtual memory address spaces") Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	kfence: default to dynamic branch instead of static keys mode	Marco Elver	2	-15/+23
	We have observed that on very large machines with newer CPUs, the static key/branch switching delay is on the order of milliseconds. This is due to the required broadcast IPIs, which simply does not scale well to hundreds of CPUs (cores). If done too frequently, this can adversely affect tail latencies of various workloads. One workaround is to increase the sample interval to several seconds, while decreasing sampled allocation coverage, but the problem still exists and could still increase tail latencies. As already noted in the Kconfig help text, there are trade-offs: at lower sample intervals the dynamic branch results in better performance; however, at very large sample intervals, the static keys mode can result in better performance -- careful benchmarking is recommended. Our initial benchmarking showed that with large enough sample intervals and workloads stressing the allocator, the static keys mode was slightly better. Evaluating and observing the possible system-wide side-effects of the static-key-switching induced broadcast IPIs, however, was a blind spot (in particular on large machines with 100s of cores). Therefore, a major downside of the static keys mode is, unfortunately, that it is hard to predict performance on new system architectures and topologies, but also making conclusions about performance of new workloads based on a limited set of benchmarks. Most distributions will simply select the defaults, while targeting a large variety of different workloads and system architectures. As such, the better default is CONFIG_KFENCE_STATIC_KEYS=n, and re-enabling it is only recommended after careful evaluation. For reference, on x86-64 the condition in kfence_alloc() generates exactly 2 instructions in the kmem_cache_alloc() fast-path: \| ... \| cmpl $0x0,0x1a8021c(%rip) # ffffffff82d560d0 <kfence_allocation_gate> \| je ffffffff812d6003 <kmem_cache_alloc+0x243> \| ... which, given kfence_allocation_gate is infrequently modified, should be well predicted by most CPUs. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Jann Horn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	kfence: always use static branches to guard kfence_alloc()	Marco Elver	2	-19/+18
	Regardless of KFENCE mode (CONFIG_KFENCE_STATIC_KEYS: either using static keys to gate allocations, or using a simple dynamic branch), always use a static branch to avoid the dynamic branch in kfence_alloc() if KFENCE was disabled at boot. For CONFIG_KFENCE_STATIC_KEYS=n, this now avoids the dynamic branch if KFENCE was disabled at boot. To simplify, also unifies the location where kfence_allocation_gate is read-checked to just be inline in kfence_alloc(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Jann Horn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	kfence: shorten critical sections of alloc/free	Marco Elver	1	-17/+21
	Initializing memory and setting/checking the canary bytes is relatively expensive, and doing so in the meta->lock critical sections extends the duration with preemption and interrupts disabled unnecessarily. Any reads to meta->addr and meta->size in kfence_guarded_alloc() and kfence_guarded_free() don't require locking meta->lock as long as the object is removed from the freelist: only kfence_guarded_alloc() sets meta->addr and meta->size after removing it from the freelist, which requires a preceding kfence_guarded_free() returning it to the list or the initial state. Therefore move reads to meta->addr and meta->size, including expensive memory initialization using them, out of meta->lock critical sections. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Acked-by: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Jann Horn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	kfence: test: use kunit_skip() to skip tests	Marco Elver	1	-6/+8
	Use the new kunit_skip() to skip tests if requirements were not met. It makes it easier to see in KUnit's summary if there were skipped tests. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Reviewed-by: David Gow <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Aleksandr Nogikh <[email protected]> Cc: Taras Madan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	kfence: add note to documentation about skipping covered allocations	Marco Elver	1	-0/+11
	Add a note briefly mentioning the new policy about "skipping currently covered allocations if pool close to full." Since this has a notable impact on KFENCE's bug-detection ability on systems with large uptimes, it is worth pointing out the feature. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Acked-by: Alexander Potapenko <[email protected]> Cc: Aleksandr Nogikh <[email protected]> Cc: Jann Horn <[email protected]> Cc: Taras Madan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	kfence: limit currently covered allocations when pool nearly full	Marco Elver	2	-2/+109
	One of KFENCE's main design principles is that with increasing uptime, allocation coverage increases sufficiently to detect previously undetected bugs. We have observed that frequent long-lived allocations of the same source (e.g. pagecache) tend to permanently fill up the KFENCE pool with increasing system uptime, thus breaking the above requirement. The workaround thus far had been increasing the sample interval and/or increasing the KFENCE pool size, but is no reliable solution. To ensure diverse coverage of allocations, limit currently covered allocations of the same source once pool utilization reaches 75% (configurable via `kfence.skip_covered_thresh`) or above. The effect is retaining reasonable allocation coverage when the pool is close to full. A side-effect is that this also limits frequent long-lived allocations of the same source filling up the pool permanently. Uniqueness of an allocation for coverage purposes is based on its (partial) allocation stack trace (the source). A Counting Bloom filter is used to check if an allocation is covered; if the allocation is currently covered, the allocation is skipped by KFENCE. Testing was done using: (a) a synthetic workload that performs frequent long-lived allocations (default config values; sample_interval=1; num_objects=63), and (b) normal desktop workloads on an otherwise idle machine where the problem was first reported after a few days of uptime (default config values). In both test cases the sampled allocation rate no longer drops to zero at any point. In the case of (b) we observe (after 2 days uptime) 15% unique allocations in the pool, 77% pool utilization, with 20% "skipped allocations (covered)". [[email protected]: simplify and just use hash_32(), use more random stack_hash_seed] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix 32 bit] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Acked-by: Alexander Potapenko <[email protected]> Cc: Aleksandr Nogikh <[email protected]> Cc: Jann Horn <[email protected]> Cc: Taras Madan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	kfence: move saving stack trace of allocations into __kfence_alloc()	Marco Elver	1	-11/+24
	Move the saving of the stack trace of allocations into __kfence_alloc(), so that the stack entries array can be used outside of kfence_guarded_alloc() and we avoid potentially unwinding the stack multiple times. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Acked-by: Alexander Potapenko <[email protected]> Cc: Aleksandr Nogikh <[email protected]> Cc: Jann Horn <[email protected]> Cc: Taras Madan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	kfence: count unexpectedly skipped allocations	Marco Elver	1	-3/+13
	Maintain a counter to count allocations that are skipped due to being incompatible (oversized, incompatible gfp flags) or no capacity. This is to compute the fraction of allocations that could not be serviced by KFENCE, which we expect to be rare. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Reviewed-by: Dmitry Vyukov <[email protected]> Acked-by: Alexander Potapenko <[email protected]> Cc: Aleksandr Nogikh <[email protected]> Cc: Jann Horn <[email protected]> Cc: Taras Madan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	stacktrace: move filter_irq_stacks() to kernel/stacktrace.c	Marco Elver	4	-26/+31
	filter_irq_stacks() has little to do with the stackdepot implementation, except that it is usually used by users (such as KASAN) of stackdepot to reduce the stack trace. However, filter_irq_stacks() itself is not useful without a stack trace as obtained by stack_trace_save() and friends. Therefore, move filter_irq_stacks() to kernel/stacktrace.c, so that new users of filter_irq_stacks() do not have to start depending on STACKDEPOT only for filter_irq_stacks(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Acked-by: Dmitry Vyukov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Jann Horn <[email protected]> Cc: Aleksandr Nogikh <[email protected]> Cc: Taras Madan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	include/linux/mm.h: move nr_free_buffer_pages from swap.h to mm.h	Mianhan Liu	8	-7/+2
	nr_free_buffer_pages could be exposed through mm.h instead of swap.h. The advantage of this change is that it can reduce the obsolete includes. For example, net/ipv4/tcp.c wouldn't need swap.h any more since it has already included mm.h. Similarly, after checking all the other files, it comes that tcp.c, udp.c meter.c ,... follow the same rule, so these files can have swap.h removed too. Moreover, after preprocessing all the files that use nr_free_buffer_pages, it turns out that those files have already included mm.h.Thus, we can move nr_free_buffer_pages from swap.h to mm.h safely. This change will not affect the compilation of other files. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mianhan Liu <[email protected]> Cc: Jakub Kicinski <[email protected]> CC: Ulf Hansson <[email protected]> Cc: "David S . Miller" <[email protected]> Cc: Simon Horman <[email protected]> Cc: Pravin B Shelar <[email protected]> Cc: Vlad Yasevich <[email protected]> Cc: Marcelo Ricardo Leitner <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm: remove HARDENED_USERCOPY_FALLBACK	Stephen Kitt	6	-52/+0
	This has served its purpose and is no longer used. All usercopy violations appear to have been handled by now, any remaining instances (or new bugs) will cause copies to be rejected. This isn't a direct revert of commit 2d891fbc3bb6 ("usercopy: Allow strict enforcement of whitelists"); since usercopy_fallback is effectively 0, the fallback handling is removed too. This also removes the usercopy_fallback module parameter on slab_common. Link: https://github.com/KSPP/linux/issues/153 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Stephen Kitt <[email protected]> Suggested-by: Kees Cook <[email protected]> Acked-by: Kees Cook <[email protected]> Reviewed-by: Joel Stanley <[email protected]> [defconfig change] Acked-by: David Rientjes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: James Morris <[email protected]> Cc: "Serge E . Hallyn" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	zram: introduce an aged idle interface	Brian Geffon	2	-16/+54
	This change introduces an aged idle interface to the existing idle sysfs file for zram. When CONFIG_ZRAM_MEMORY_TRACKING is enabled the idle file now also accepts an integer argument. This integer is the age (in seconds) of pages to mark as idle. The idle file still supports 'all' as it always has. This new approach allows for much more control over which pages get marked as idle. [[email protected]: use IS_ENABLED and cleanup comment] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: Sergey's cleanup suggestions] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Brian Geffon <[email protected]> Acked-by: Minchan Kim <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Cc: Nitin Gupta <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Suleiman Souhlal <[email protected]> Cc: Jesse Barnes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	zram: off by one in read_block_state()	Dan Carpenter	1	-1/+1
	snprintf() returns the number of bytes it would have printed if there were space. But it does not count the NUL terminator. So that means that if "count == copied" then this has already overflowed by one character. This bug likely isn't super harmful in real life. Link: https://lkml.kernel.org/r/20210916130404.GA25094@kili Fixes: c0265342bff4 ("zram: introduce zram memory tracking") Signed-off-by: Dan Carpenter <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	zram_drv: allow reclaim on bio_alloc	Jaewon Kim	1	-1/+1
	The read_from_bdev_async is not called on atomic context. So GFP_NOIO is available rather than GFP_ATOMIC. If there were reclaimable pages with GFP_NOIO, we can avoid allocation failure and page fault failure. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jaewon Kim <[email protected]> Reported-by: Yong-Taek Lee <[email protected]> Acked-by: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/highmem: remove deprecated kmap_atomic	Ira Weiny	2	-17/+17
	kmap_atomic() is being deprecated in favor of kmap_local_page(). Replace the uses of kmap_atomic() within the highmem code. On profiling clear_huge_page() using ftrace an improvement of 62% was observed on the below setup. Setup:- Below data has been collected on Qualcomm's SM7250 SoC THP enabled (kernel v4.19.113) with only CPU-0(Cortex-A55) and CPU-7(Cortex-A76) switched on and set to max frequency, also DDR set to perf governor. FTRACE Data:- Base data:- Number of iterations: 48 Mean of allocation time: 349.5 us std deviation: 74.5 us v4 data:- Number of iterations: 48 Mean of allocation time: 131 us std deviation: 32.7 us The following simple userspace experiment to allocate 100MB(BUF_SZ) of pages and writing to it gave us a good insight, we observed an improvement of 42% in allocation and writing timings. ------------------------------------------------------------- Test code snippet ------------------------------------------------------------- clock_start(); buf = malloc(BUF_SZ); /* Allocate 100 MB of memory / for(i=0; i < BUF_SZ_PAGES; i++) { ((int )(buf + (iPAGE_SIZE))) = 1; } clock_end(); ------------------------------------------------------------- Malloc test timings for 100MB anon allocation:- Base data:- Number of iterations: 100 Mean of allocation time: 31831 us std deviation: 4286 us v4 data:- Number of iterations: 100 Mean of allocation time: 18193 us std deviation: 4915 us [[email protected]: fix zero_user_segments()] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ira Weiny <[email protected]> Signed-off-by: Prathu Baronia <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/zsmalloc.c: close race window between zs_pool_dec_isolated() and ↵	Miaohe Lin	1	-3/+4
	zs_unregister_migration() There is one possible race window between zs_pool_dec_isolated() and zs_unregister_migration() because wait_for_isolated_drain() checks the isolated count without holding class->lock and there is no order inside zs_pool_dec_isolated(). Thus the below race window could be possible: zs_pool_dec_isolated zs_unregister_migration check pool->destroying != 0 pool->destroying = true; smp_mb(); wait_for_isolated_drain() wait for pool->isolated_pages == 0 atomic_long_dec(&pool->isolated_pages); atomic_long_read(&pool->isolated_pages) == 0 Since we observe the pool->destroying (false) before atomic_long_dec() for pool->isolated_pages, waking pool->migration_wait up is missed. Fix this by ensure checking pool->destroying happens after the atomic_long_dec(&pool->isolated_pages). Link: https://lkml.kernel.org/r/[email protected] Fixes: 701d678599d0 ("mm/zsmalloc.c: fix race condition in zs_destroy_pool") Signed-off-by: Miaohe Lin <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Henry Burns <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/rmap.c: avoid double faults migrating device private pages	Alistair Popple	1	-2/+6
	During migration special page table entries are installed for each page being migrated. These entries store the pfn and associated permissions of ptes mapping the page being migarted. Device-private pages use special swap pte entries to distinguish read-only vs. writeable pages which the migration code checks when creating migration entries. Normally this follows a fast path in migrate_vma_collect_pmd() which correctly copies the permissions of device-private pages over to migration entries when migrating pages back to the CPU. However the slow-path falls back to using try_to_migrate() which unconditionally creates read-only migration entries for device-private pages. This leads to unnecessary double faults on the CPU as the new pages are always mapped read-only even when they could be mapped writeable. Fix this by correctly copying device-private permissions in try_to_migrate_one(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Alistair Popple <[email protected]> Reported-by: Ralph Campbell <[email protected]> Reviewed-by: John Hubbard <[email protected]> Cc: Jerome Glisse <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/memory_hotplug: indicate MEMBLOCK_DRIVER_MANAGED with ↵	David Hildenbrand	1	-1/+4
	IORESOURCE_SYSRAM_DRIVER_MANAGED Let's communicate driver-managed regions to memblock, to properly teach kexec_file with CONFIG_ARCH_KEEP_MEMBLOCK to not place images on these memory regions. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Jianyong Wu <[email protected]> Cc: Jiaxun Yang <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Shahab Vahedi <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vineet Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED	David Hildenbrand	3	-2/+23
	Let's add a flag that corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED, indicating that we're dealing with a memory region that is never indicated in the firmware-provided memory map, but always detected and added by a driver. Similar to MEMBLOCK_HOTPLUG, most infrastructure has to treat such memory regions like ordinary MEMBLOCK_NONE memory regions -- for example, when selecting memory regions to add to the vmcore for dumping in the crashkernel via for_each_mem_range(). However, especially kexec_file is not supposed to select such memblocks via for_each_free_mem_range() / for_each_free_mem_range_reverse() to place kexec images, similar to how we handle IORESOURCE_SYSRAM_DRIVER_MANAGED without CONFIG_ARCH_KEEP_MEMBLOCK. We'll make sure that memory hotplug code sets the flag where applicable (IORESOURCE_SYSRAM_DRIVER_MANAGED) next. This prepares architectures that need CONFIG_ARCH_KEEP_MEMBLOCK, such as arm64, for virtio-mem support. Note that kexec must not indicate this memory to the second kernel and must not place kexec-images on this memory. Let's add a comment to kexec_walk_memblock(), documenting how we handle MEMBLOCK_DRIVER_MANAGED now just like using IORESOURCE_SYSRAM_DRIVER_MANAGED in locate_mem_hole_callback() for kexec_walk_resources(). Also note that MEMBLOCK_HOTPLUG cannot be reused due to different semantics: MEMBLOCK_HOTPLUG: memory is indicated as "System RAM" in the firmware-provided memory map and added to the system early during boot; kexec has to indicate this memory to the second kernel and can place kexec-images on this memory. After memory hotunplug, kexec has to be re-armed. We mostly ignore this flag when "movable_node" is not set on the kernel command line, because then we're told to not care about hotunpluggability of such memory regions. MEMBLOCK_DRIVER_MANAGED: memory is not indicated as "System RAM" in the firmware-provided memory map; this memory is always detected and added to the system by a driver; memory might not actually be physically hotunpluggable. kexec must not indicate this memory to the second kernel and must not place kexec-images on this memory. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Jianyong Wu <[email protected]> Cc: Jiaxun Yang <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Shahab Vahedi <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vineet Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	memblock: allow to specify flags with memblock_add_node()	David Hildenbrand	12	-17/+26
	We want to specify flags when hotplugging memory. Let's prepare to pass flags to memblock_add_node() by adjusting all existing users. Note that when hotplugging memory the system is already up and running and we might have concurrent memblock users: for example, while we're hotplugging memory, kexec_file code might search for suitable memory regions to place kexec images. It's important to add the memory directly to memblock via a single call with the right flags, instead of adding the memory first and apply flags later: otherwise, concurrent memblock users might temporarily stumble over memblocks with wrong flags, which will be important in a follow-up patch that introduces a new flag to properly handle add_memory_driver_managed(). Link: https://lkml.kernel.org/r/[email protected] Acked-by: Geert Uytterhoeven <[email protected]> Acked-by: Heiko Carstens <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Shahab Vahedi <[email protected]> [arch/arc] Reviewed-by: Mike Rapoport <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Jianyong Wu <[email protected]> Cc: Jiaxun Yang <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vineet Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	memblock: improve MEMBLOCK_HOTPLUG documentation	David Hildenbrand	1	-1/+5
	The description of MEMBLOCK_HOTPLUG is currently short and consequently misleading: we're actually dealing with a memory region that might get hotunplugged later (i.e., the platform+firmware supports it), yet it is indicated in the firmware-provided memory map as system ram that will just get used by the system for any purpose when not taking special care. The firmware marked this memory region as a hot(un)plugged (e.g., hotplugged before reboot), implying that it might get hotunplugged again later. Whether we consider this information depends on the "movable_node" kernel commandline parameter: only with "movable_node" set, we'll try keeping this memory hotunpluggable, for example, by not serving early allocations from this memory region and by letting the buddy manage it using the ZONE_MOVABLE. Let's make this clearer by extending the documentation. Note: kexec has to indicate this memory to the second kernel. With "movable_node" set, we don't want to place kexec-images on this memory. Without "movable_node" set, we don't care and can place kexec-images on this memory. In both cases, after successful memory hotunplug, kexec has to be re-armed to update the memory map for the second kernel and to place the kexec-images somewhere else. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Jianyong Wu <[email protected]> Cc: Jiaxun Yang <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Shahab Vahedi <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vineet Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/memory_hotplug: handle memblock_add_node() failures in add_memory_resource()	David Hildenbrand	1	-2/+6
	Patch series "mm/memory_hotplug: full support for add_memory_driver_managed() with CONFIG_ARCH_KEEP_MEMBLOCK", v2. Architectures that require CONFIG_ARCH_KEEP_MEMBLOCK=y, such as arm64, don't cleanly support add_memory_driver_managed() yet. Most prominently, kexec_file can still end up placing kexec images on such driver-managed memory, resulting in undesired behavior, for example, having kexec images located on memory not part of the firmware-provided memory map. Teaching kexec to not place images on driver-managed memory is especially relevant for virtio-mem. Details can be found in commit 7b7b27214bba ("mm/memory_hotplug: introduce add_memory_driver_managed()"). Extend memblock with a new flag and set it from memory hotplug code when applicable. This is required to fully support virtio-mem on arm64, making also kexec_file behave like on x86-64. This patch (of 2): If memblock_add_node() fails, we're most probably running out of memory. While this is unlikely to happen, it can happen and having memory added without a memblock can be problematic for architectures that use memblock to detect valid memory. Let's fail in a nice way instead of silently ignoring the error. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Jianyong Wu <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Jiaxun Yang <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Shahab Vahedi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	x86: remove memory hotplug support on X86_32	David Hildenbrand	2	-34/+3
	CONFIG_MEMORY_HOTPLUG was marked BROKEN over one year and we just restricted it to 64 bit. Let's remove the unused x86 32bit implementation and simplify the Kconfig. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Alex Shi <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/memory_hotplug: remove stale function declarations	David Hildenbrand	1	-3/+0
	These functions no longer exist. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Alex Shi <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/memory_hotplug: remove HIGHMEM leftovers	David Hildenbrand	4	-42/+2
	We don't support CONFIG_MEMORY_HOTPLUG on 32 bit and consequently not HIGHMEM. Let's remove any leftover code -- including the unused "status_change_nid_high" field part of the memory notifier. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Alex Shi <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/memory_hotplug: restrict CONFIG_MEMORY_HOTPLUG to 64 bit	David Hildenbrand	1	-1/+1
	32 bit support is broken in various ways: for example, we can online memory that should actually go to ZONE_HIGHMEM to ZONE_MOVABLE or in some cases even to one of the other kernel zones. We marked it BROKEN in commit b59d02ed0869 ("mm/memory_hotplug: disable the functionality for 32b") almost one year ago. According to that commit it might be broken at least since 2017. Further, there is hardly a sane use case nowadays. Let's just depend completely on 64bit, dropping the "BROKEN" dependency to make clear that we are not going to support it again. Next, we'll remove some HIGHMEM leftovers from memory hotplug code to clean up. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Alex Shi <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSE	David Hildenbrand	13	-36/+24
	CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Shuah Khan <[email protected]> [kselftest] Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: Oscar Salvador <[email protected]> Cc: Alex Shi <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/memory_hotplug: remove CONFIG_X86_64_ACPI_NUMA dependency from ↵	David Hildenbrand	1	-1/+1
	CONFIG_MEMORY_HOTPLUG Patch series "mm/memory_hotplug: Kconfig and 32 bit cleanups". Some cleanups around CONFIG_MEMORY_HOTPLUG, including removing 32 bit leftovers of memory hotplug support. This patch (of 6): SPARSEMEM is the only possible memory model for x86-64, FLATMEM is not possible: config ARCH_FLATMEM_ENABLE def_bool y depends on X86_32 && !NUMA And X86_64_ACPI_NUMA (obviously) only supports x86-64: config X86_64_ACPI_NUMA def_bool y depends on X86_64 && NUMA && ACPI && PCI Let's just remove the CONFIG_X86_64_ACPI_NUMA dependency, as it does no longer make sense. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Alex Shi <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	memory-hotplug.rst: document the "auto-movable" online policy	David Hildenbrand	1	-20/+121
	Commit e83a437faa62 ("mm/memory_hotplug: introduce "auto-movable" online policy") introduced a new memory online policy to automatically select a zone for memory blocks to be onlined. It added a way to set the active online policy and tunables for the auto-movable online policy. Follow-up commits tweaked the "auto-movable" policy to also consider memory device details when selecting zones for memory blocks to be onlined. Let's document the new toggles and how the two online policies we have work. [[email protected]: updates] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Mike Rapoport <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	memory-hotplug.rst: fix wrong /sys/module/memory_hotplug/parameters/ path	David Hildenbrand	1	-1/+1
	We accidentially added a superfluous "s". Link: https://lkml.kernel.org/r/[email protected] Fixes: ac3332c44767 ("memory-hotplug.rst: complete admin-guide overhaul") Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Mike Rapoport <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	memory-hotplug.rst: fix two instances of "movablecore" that should be ↵	David Hildenbrand	1	-2/+2
	"movable_node" Patch series "memory-hotplug.rst: document the "auto-movable" online policy". Now that the memory-hotplug.rst overhaul is upstream, proper documentation for the "auto-movable" online policy, documenting all new toggles and options. Along, two fixes for the original overhaul. This patch (of 3): We really want to refer to the "movable_node" kernel command line parameter here. Link: https://lkml.kernel.org/r/[email protected] Fixes: ac3332c44767 ("memory-hotplug.rst: complete admin-guide overhaul") Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Mike Rapoport <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/memory_hotplug: add static qualifier for online_policy_to_str()	Tang Yizhou	1	-1/+1
	online_policy_to_str is only used in memory_hotplug.c and should be defined as static. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tang Yizhou <[email protected]> Reviewed-by: Muchun Song <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	selftests/vm: make MADV_POPULATE_(READ\|WRITE) use in-tree headers	David Hildenbrand	1	-14/+1
	The madv_populate selftest currently builds with a warning when the local installed headers (via the distribution) don't include MADV_POPULATE_READ and MADV_POPULATE_WRITE. The warning is correct, because the test cannot locate the necessary header. The reason is that the in-tree installed headers (usr/include) have a "linux" instead of a "sys" subdirectory. Including "linux/mman.h" instead of "sys/mman.h" doesn't work (e.g., mmap() and madvise() are not defined that way). The only thing that seems to work is including "linux/mman.h" in addition to "sys/mman.h". We can get rid of our availability check and simplify. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Reported-by: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm: vmstat.c: make extfrag_index show more pretty	Lin Feng	1	-1/+1
	fragmentation_index may return -1000 and the corresponding formated value showed by seq_printf will take a negative signatrue, but other positive formated values don't take a positive signatrue, so the output becomes unaligned. before: Node 0, zone DMA -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 Node 0, zone DMA32 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 Node 0, zone Normal -1.000 -1.000 -1.000 -1.000 0.931 0.966 0.983 0.992 0.996 0.998 0.999 after this patch: Node 0, zone DMA -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 Node 0, zone DMA32 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 Node 0, zone Normal -1.000 -1.000 -1.000 -1.000 0.931 0.966 0.983 0.992 0.996 0.998 0.999 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Lin Feng <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	mm/vmstat: annotate data race for zone->free_area[order].nr_free	Liu Shixin	1	-3/+12
	KCSAN reports a data-race on v5.10 which also exists on mainline: BUG: KCSAN: data-race in extfrag_for_order+0x33/0x2d0 race at unknown origin, with read to 0xffff9ee9bfffab48 of 8 bytes by task 34 on cpu 1: extfrag_for_order+0x33/0x2d0 kcompactd+0x5f0/0xce0 kthread+0x1f9/0x220 ret_from_fork+0x22/0x30 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 34 Comm: kcompactd0 Not tainted 5.10.0+ #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Access to zone->free_area[order].nr_free in extfrag_for_order() and frag_show_print() is lockless. That's intentional and the stats are a rough estimate anyway. Annotate them with data_race(). [[email protected]: add comments] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Liu Shixin <[email protected]> Cc: "Paul E . McKenney" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	selftests: vm: add KSM huge pages merging time test	Pedro Demarchi Gomes	1	-1/+124
	Add test case of KSM merging time using mostly huge pages Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Pedro Demarchi Gomes <[email protected]> Cc: Zhansaya Bagdauletkyzy <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-11-06	selftest/vm: fix ksm selftest to run with different NUMA topologies	Aneesh Kumar K.V	1	-3/+26
	Platforms can have non-contiguous NUMA nodes like below #numactl -H available: 2 nodes (0,8) ..... node distances: node 0 8 0: 10 40 8: 40 10 #numactl -H available: 1 nodes (1) .... node distances: node 1 1: 10 Hence update the test to not assume the presence of Node 0 and 1 and also use numa_num_configured_nodes() instead of numa_max_node for finding whether to skip the test. Link: https://lkml.kernel.org/r/[email protected] Fixes: 82e717ad3501 ("selftests: vm: add KSM merging across nodes test") Signed-off-by: Aneesh Kumar K.V <[email protected]> Reviewed-by: Pasha Tatashin <[email protected]> Cc: Zhansaya Bagdauletkyzy <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Tyler Hicks <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>