Age | Commit message (Collapse) | Author | Files | Lines |
|
Commit fb16d891 "kconfig: replace 'oldnoconfig' with 'olddefconfig', and
keep the old name", changed ktest's default config update from
oldnoconfig to olddefconfig without adding oldnoconfig as a backup.
The make oldnoconfig works much better than its backup of:
yes '' | make oldconfig
But due to this change, and the fact that ktest is used to build lots of
older kernels (and for bisects), it forgoes the oldnoconfig completely.
Cc: Adam Lee <[email protected]>
Cc: Michal Marek <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
|
|
Old memory hotplug code and new online/movable may cause a online node
don't have any normal memory, but memory-management acts bad when we have
nodes which is online but don't have any normal memory. Example: it may
cause a bound task fail on all kernel allocation and cause the task can't
create task or create other kernel object.
So we disable non-normal-memory-node here, we will enable it when we
prepared.
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Greg KH <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Make online_movable/online_kernel can empty a zone or can move memory to a
empty zone.
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Greg KH <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add online_movable and online_kernel for logic memory hotplug. This is
the dynamic version of "movablecore" & "kernelcore".
We have the same reason to introduce it as to introduce "movablecore" &
"kernelcore". It has the same motive as "movablecore" & "kernelcore", but
it is dynamic/running-time:
o We can configure memory as kernelcore or movablecore after boot.
Userspace workload is increased, we need more hugepage, we can't use
"online_movable" to add memory and allow the system use more
THP(transparent-huge-page), vice-verse when kernel workload is increase.
Also help for virtualization to dynamic configure host/guest's memory,
to save/(reduce waste) memory.
Memory capacity on Demand
o When a new node is physically online after boot, we need to use
"online_movable" or "online_kernel" to configure/portion it as we
expected when we logic-online it.
This configuration also helps for physically-memory-migrate.
o all benefit as the same as existed "movablecore" & "kernelcore".
o Preparing for movable-node, which is very important for power-saving,
hardware partitioning and high-available-system(hardware fault
management).
(Note, we don't introduce movable-node here.)
Action behavior:
When a memoryblock/memorysection is onlined by "online_movable", the kernel
will not have directly reference to the page of the memoryblock,
thus we can remove that memory any time when needed.
When it is online by "online_kernel", the kernel can use it.
When it is online by "online", the zone type doesn't changed.
Current constraints:
Only the memoryblock which is adjacent to the ZONE_MOVABLE
can be online from ZONE_NORMAL to ZONE_MOVABLE.
[[email protected]: use min_t, cleanups]
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Greg KH <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
use [index] = init_value
use N_xxxxx instead of hardcode.
Make it more readability and easier to add new state.
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
Acked-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It is strange that alloc_bootmem() returns a virtual address and
free_bootmem() requires a physical address. Anyway, free_bootmem()'s
first parameter should be physical address.
There are some call sites for free_bootmem() with virtual address. So fix
them.
[[email protected]: improve free_bootmem() and free_bootmem_pate() documentation]
Signed-off-by: Joonsoo Kim <[email protected]>
Cc: Haavard Skinnemoen <[email protected]>
Cc: Hans-Christian Egtvedt <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: FUJITA Tomonori <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There is no code for CONFIG_HAVE_ARCH_BOOTMEM, so remove it.
Signed-off-by: Joonsoo Kim <[email protected]>
Cc: Haavard Skinnemoen <[email protected]>
Cc: Hans-Christian Egtvedt <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: FUJITA Tomonori <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commits 2139cbe627b8 ("cma: fix counting of isolated pages") and
d95ea5d18e69 ("cma: fix watermark checking") introduced a reliable
method of free page accounting when memory is being allocated from CMA
regions, so the workaround introduced earlier by commit 49f223a9cd96
("mm: trigger page reclaim in alloc_contig_range() to stabilise
watermarks") can be finally removed.
Signed-off-by: Marek Szyprowski <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Bartlomiej Zolnierkiewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since commit 2139cbe627b8 ("cma: fix counting of isolated pages") free
pages in isolated pageblocks are not accounted to NR_FREE_PAGES counters,
so watermarks check is not required if one operates on a free page in
isolated pageblock.
Signed-off-by: Marek Szyprowski <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Mel Gorman <[email protected]>
Acked-by: Michal Nazarewicz <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Bartlomiej Zolnierkiewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
test_set_oom_score_adj() and compare_swap_oom_score_adj() are used to
specify that current should be killed first if an oom condition occurs in
between the two calls.
The usage is
short oom_score_adj = test_set_oom_score_adj(OOM_SCORE_ADJ_MAX);
...
compare_swap_oom_score_adj(OOM_SCORE_ADJ_MAX, oom_score_adj);
to store the thread's oom_score_adj, temporarily change it to the maximum
score possible, and then restore the old value if it is still the same.
This happens to still be racy, however, if the user writes
OOM_SCORE_ADJ_MAX to /proc/pid/oom_score_adj in between the two calls.
The compare_swap_oom_score_adj() will then incorrectly reset the old value
prior to the write of OOM_SCORE_ADJ_MAX.
To fix this, introduce a new oom_flags_t member in struct signal_struct
that will be used for per-thread oom killer flags. KSM and swapoff can
now use a bit in this member to specify that threads should be killed
first in oom conditions without playing around with oom_score_adj.
This also allows the correct oom_score_adj to always be shown when reading
/proc/pid/oom_score.
Signed-off-by: David Rientjes <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Cc: Anton Vorontsov <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The maximum oom_score_adj is 1000 and the minimum oom_score_adj is -1000,
so this range can be represented by the signed short type with no
functional change. The extra space this frees up in struct signal_struct
will be used for per-thread oom kill flags in the next patch.
Signed-off-by: David Rientjes <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Cc: Anton Vorontsov <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
register_node() is defined as extern in include/linux/node.h. But the
function is only called from register_one_node() in driver/base/node.c.
So the patch defines register_node() as static.
Signed-off-by: Yasuaki Ishimatsu <[email protected]>
Acked-by: David Rientjes <[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Remove some duplicate code and simplify alloc_pages_vma(). No functional
change.
Signed-off-by: David Rientjes <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
kswapd()->try_to_freeze() is defined to return a boolean, so it's better
to use a bool to hold its return value.
Signed-off-by: Jie Liu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The PATCH "mm: introduce compaction and migration for virtio ballooned pages"
hacks around putback_lru_pages() in order to allow ballooned pages to be
re-inserted on balloon page list as if a ballooned page was like a LRU page.
As ballooned pages are not legitimate LRU pages, this patch introduces
putback_movable_pages() to properly cope with cases where the isolated
pageset contains ballooned pages and LRU pages, thus fixing the mentioned
inelegant hack around putback_lru_pages().
Signed-off-by: Rafael Aquini <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.
Besides making balloon pages movable at allocation time and introducing
the necessary primitives to perform balloon page migration/compaction,
this patch also introduces the following locking scheme, in order to
enhance the syncronization methods for accessing elements of struct
virtio_balloon, thus providing protection against concurrent access
introduced by parallel memory migration threads.
- balloon_lock (mutex) : synchronizes the access demand to elements of
struct virtio_balloon and its queue operations;
[[email protected]: fix missing unlock on error in fill_balloon()]
[[email protected]: avoid having multiple return points in fill_balloon()]
[[email protected]: fix printk warning]Signed-off-by: Rafael Aquini <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Wei Yongjun <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.
This patch introduces the helper functions as well as the necessary changes
to teach compaction and migration bits how to cope with pages which are
part of a guest memory balloon, in order to make them movable by memory
compaction procedures.
Signed-off-by: Rafael Aquini <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.
This patch introduces a common interface to help a balloon driver on
making its page set movable to compaction, and thus allowing the system
to better leverage the compation efforts on memory defragmentation.
[[email protected]: use PAGE_FLAGS_CHECK_AT_PREP, s/__balloon_page_flags/page_flags_cleared/, small cleanups]
[[email protected]: allow balloon compaction for any system with memory compaction enabled, which is the defconfig]
Signed-off-by: Rafael Aquini <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Overhaul struct address_space.assoc_mapping renaming it to
address_space.private_data and its type is redefined to void*. By this
approach we consistently name the .private_* elements from struct
address_space as well as allow extended usage for address_space
association with other data structures through ->private_data.
Also, all users of old ->assoc_mapping element are converted to reflect
its new name and type change (->private_data).
Signed-off-by: Rafael Aquini <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a
guest, thus imposing performance penalties associated with the reduced
number of transparent huge pages that could be used by the guest workload.
This patch-set follows the main idea discussed at 2012 LSFMMS session:
"Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/
to introduce the required changes to the virtio_balloon driver, as well as
the changes to the core compaction & migration bits, in order to make
those subsystems aware of ballooned pages and allow memory balloon pages
become movable within a guest, thus avoiding the aforementioned
fragmentation issue
Following are numbers that prove this patch benefits on allowing
compaction to be more effective at memory ballooned guests.
Results for STRESS-HIGHALLOC benchmark, from Mel Gorman's mmtests suite,
running on a 4gB RAM KVM guest which was ballooning 512mB RAM in 64mB
chunks, at every minute (inflating/deflating), while test was running:
===BEGIN stress-highalloc
STRESS-HIGHALLOC
highalloc-3.7 highalloc-3.7
rc4-clean rc4-patch
Pass 1 55.00 ( 0.00%) 62.00 ( 7.00%)
Pass 2 54.00 ( 0.00%) 62.00 ( 8.00%)
while Rested 75.00 ( 0.00%) 80.00 ( 5.00%)
MMTests Statistics: duration
3.7 3.7
rc4-clean rc4-patch
User 1207.59 1207.46
System 1300.55 1299.61
Elapsed 2273.72 2157.06
MMTests Statistics: vmstat
3.7 3.7
rc4-clean rc4-patch
Page Ins 3581516 2374368
Page Outs 11148692 10410332
Swap Ins 80 47
Swap Outs 3641 476
Direct pages scanned 37978 33826
Kswapd pages scanned 1828245 1342869
Kswapd pages reclaimed 1710236 1304099
Direct pages reclaimed 32207 31005
Kswapd efficiency 93% 97%
Kswapd velocity 804.077 622.546
Direct efficiency 84% 91%
Direct velocity 16.703 15.682
Percentage direct scans 2% 2%
Page writes by reclaim 79252 9704
Page writes file 75611 9228
Page writes anon 3641 476
Page reclaim immediate 16764 11014
Page rescued immediate 0 0
Slabs scanned 2171904 2152448
Direct inode steals 385 2261
Kswapd inode steals 659137 609670
Kswapd skipped wait 1 69
THP fault alloc 546 631
THP collapse alloc 361 339
THP splits 259 263
THP fault fallback 98 50
THP collapse fail 20 17
Compaction stalls 747 499
Compaction success 244 145
Compaction failures 503 354
Compaction pages moved 370888 474837
Compaction move failure 77378 65259
===END stress-highalloc
This patch:
Introduce MIGRATEPAGE_SUCCESS as the default return code for
address_space_operations.migratepage() method and documents the expected
return code for the same method in failure cases.
Signed-off-by: Rafael Aquini <[email protected]>
Cc: Rusty Russell <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Consistently spell this word across arch/sparc/mm and arch/sparc/kernel.
Acked-by: David Miller <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Update the sparc64 hugetlb_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.
Signed-off-by: Michel Lespinasse <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Update the sparc64 arch_get_unmapped_area[_topdown] functions to make use
of vm_unmapped_area() instead of implementing a brute force search.
[[email protected]: remove now-unused COLOUR_ALIGN_DOWN()]
Signed-off-by: Michel Lespinasse <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Update the tile hugetlb_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.
[[email protected]: fix build]
Signed-off-by: Michel Lespinasse <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Update the sparc32 arch_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.
[[email protected]: fix build]
[[email protected]: remove now-unused COLOUR_ALIGN()]
Signed-off-by: Michel Lespinasse <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Mundt <[email protected]>
Acked-by: "David S. Miller" <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Update the sh arch_get_unmapped_area[_topdown] functions to make use of
vm_unmapped_area() instead of implementing a brute force search.
[[email protected]: remove now-unused COLOUR_ALIGN_DOWN()]
Signed-off-by: Michel Lespinasse <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Update the arm arch_get_unmapped_area[_topdown] functions to make use of
vm_unmapped_area() instead of implementing a brute force search.
[[email protected]: remove now-unused COLOUR_ALIGN_DOWN()]
Signed-off-by: Michel Lespinasse <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Update the mips arch_get_unmapped_area[_topdown] functions to make use of
vm_unmapped_area() instead of implementing a brute force search.
[[email protected]: remove now-unused COLOUR_ALIGN_DOWN()]
Signed-off-by: Michel Lespinasse <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Update the i386 hugetlb_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.
[[email protected]: fix build]
Signed-off-by: Michel Lespinasse <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Update the hugetlb_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.
Signed-off-by: Michel Lespinasse <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix the x86-64 cache alignment code to take pgoff into account. Use the
x86 and MIPS cache alignment code as the basis for a generic cache
alignment function.
The old x86 code will always align the mmap to aliasing boundaries,
even if the program mmaps the file with a non-zero pgoff.
If program A mmaps the file with pgoff 0, and program B mmaps the file
with pgoff 1. The old code would align the mmaps, resulting in misaligned
pages:
A: 0123
B: 123
After this patch, they are aligned so the pages line up:
A: 0123
B: 123
Proposed by Rik van Riel.
Signed-off-by: Michel Lespinasse <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Update the x86_64 arch_get_unmapped_area[_topdown] functions to make use
of vm_unmapped_area() instead of implementing a brute force search.
Signed-off-by: Michel Lespinasse <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Implement vm_unmapped_area() using the rb_subtree_gap and highest_vm_end
information to look up for suitable virtual address space gaps.
struct vm_unmapped_area_info is used to define the desired allocation
request:
- lowest or highest possible address matching the remaining constraints
- desired gap length
- low/high address limits that the gap must fit into
- alignment mask and offset
Also update the generic arch_get_unmapped_area[_topdown] functions to make
use of vm_unmapped_area() instead of implementing a brute force search.
[[email protected]: checkpatch fixes]
Signed-off-by: Michel Lespinasse <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The kernel walks the VMA rbtree in various places, including the page
fault path. However, the vm_rb node spanned two cache lines, on 64 bit
systems with 64 byte cache lines (most x86 systems).
Rearrange vm_area_struct a little, so all the information we need to do a
VMA tree walk is in the first cache line.
[[email protected]: checkpatch fixes]
Signed-off-by: Michel Lespinasse <[email protected]>
Signed-off-by: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When CONFIG_DEBUG_VM_RB is enabled, check that rb_subtree_gap is correctly
set for every vma and that mm->highest_vm_end is also correct.
Also add an explicit 'bug' variable to track if browse_rb() detected any
invalid condition.
[[email protected]: repair innovative coding-style inventions]
Signed-off-by: Michel Lespinasse <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Define vma->rb_subtree_gap as the largest gap between any vma in the
subtree rooted at that vma, and their predecessor. Or, for a recursive
definition, vma->rb_subtree_gap is the max of:
- vma->vm_start - vma->vm_prev->vm_end
- rb_subtree_gap fields of the vmas pointed by vma->rb.rb_left and
vma->rb.rb_right
This will allow get_unmapped_area_* to find a free area of the right
size in O(log(N)) time, instead of potentially having to do a linear
walk across all the VMAs.
Also define mm->highest_vm_end as the vm_end field of the highest vma,
so that we can easily check if the following gap is suitable.
This does have the potential to make unmapping VMAs more expensive,
especially for processes with very large numbers of VMAs, where the VMA
rbtree can grow quite deep.
Signed-off-by: Michel Lespinasse <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Russell King <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Also remove -Wextra because gcc-4.6 emits lots of irritating
signed/unsigned comparison warnings.
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There was some desire in large applications using MAP_HUGETLB or
SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on
others. This is useful together with NUMA policy: use 2MB interleaving
on some mappings, but 1GB on local mappings.
This patch extends the IPC/SHM syscall interfaces slightly to allow
specifying the page size.
It borrows some upper bits in the existing flag arguments and allows
encoding the log of the desired page size in addition to the *_HUGETLB
flag. When 0 is specified the default size is used, this makes the
change fully compatible.
Extending the internal hugetlb code to handle this is straight forward.
Instead of a single mount it just keeps an array of them and selects the
right mount based on the specified page size. When no page size is
specified it uses the mount of the default page size.
The change is not visible in /proc/mounts because internal mounts don't
appear there. It also has very little overhead: the additional mounts
just consume a super block, but not more memory when not used.
I also exported the new flags to the user headers (they were previously
under __KERNEL__). Right now only symbols for x86 and some other
architecture for 1GB and 2MB are defined. The interface should already
work for all other architectures though. Only architectures that define
multiple hugetlb sizes actually need it (that is currently x86, tile,
powerpc). However tile and powerpc have user configurable hugetlb
sizes, so it's not easy to add defines. A program on those
architectures would need to query sysfs and use the appropiate log2.
[[email protected]: cleanups]
[[email protected]: fix build]
[[email protected]: checkpatch fixes]
Signed-off-by: Andi Kleen <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hillf Danton <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
action_result() fails to print out "dirty" even if an error occurred on
a dirty pagecache, because when we check PageDirty in action_result() it
was cleared after page isolation even if it's dirty before error
handling. This can break some applications that monitor this message,
so should be fixed.
There are several callers of action_result() except page_action(), but
either of them are not for LRU pages but for free pages or kernel pages,
so we don't have to consider dirty or not for them.
Note that PG_dirty can be set outside page locks as described in commit
6746aff74da2 ("HWPOISON: shmem: call set_page_dirty() with locked
page"), so this patch does not completely closes the race window, but
just narrows it.
Signed-off-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: "Jun'ichi Nomura" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This can help to catch the case where hardware is writing after dma free.
[[email protected]: tidy code, fix comment, use sizeof(page->offset), use pr_err()]
Signed-off-by: Matthieu Castet <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Exiting threads, those with PF_EXITING set, can pagefault and require
memory before they can make forward progress. This happens, for instance,
when a process must fault task->robust_list, a userspace structure, before
detaching its memory.
These threads also aren't guaranteed to get access to memory reserves
unless oom killed or killed from userspace. The oom killer won't grant
memory reserves if other threads are also exiting other than current and
stalling at the same point. This prevents needlessly killing processes
when others are already exiting.
Instead of special casing all the possible situations between PF_EXITING
getting set and a thread detaching its mm where it may allocate memory,
which probably wouldn't get updated when a change is made to the exit
path, the solution is to give all exiting threads access to memory
reserves if they call the oom killer. This allows them to quickly
allocate, detach its mm, and free the memory it represents.
Summary of Luigi's bug report:
: He had an oom condition where threads were faulting on task->robust_list
: and repeatedly called the oom killer but it would defer killing a thread
: because it saw other PF_EXITING threads. This can happen anytime we need
: to allocate memory after setting PF_EXITING and before detaching our mm;
: if there are other threads in the same state then the oom killer won't do
: anything unless one of them happens to be killed from userspace.
:
: So instead of only deferring for PF_EXITING and !task->robust_list, it's
: better to just give them access to memory reserves to prevent a potential
: livelock so that any other faults that may be introduced in the future in
: the exit path don't cause the same problem (and hopefully we don't allow
: too many of those!).
Signed-off-by: David Rientjes <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Tested-by: Luigi Semenzato <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
mem_cgroup_charge_common() is invoked as the entry point for cgroup limits
charge rather than mem_cgroup_charge(), as the later has been removed for
years. Update the cgroup/memory.txt to reflect this change.
Signed-off-by: Jie Liu <[email protected]>
Cc: Ying Han <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
On x86 memory accesses to pages without the ACCESSED flag set result in
the ACCESSED flag being set automatically. With the ARM architecture a
page access fault is raised instead (and it will continue to be raised
until the ACCESSED flag is set for the appropriate PTE/PMD).
For normal memory pages, handle_pte_fault will call pte_mkyoung
(effectively setting the ACCESSED flag). For transparent huge pages,
pmd_mkyoung will only be called for a write fault.
This patch ensures that faults on transparent hugepages which do not
result in a CoW update the access flags for the faulting pmd.
Signed-off-by: Will Deacon <[email protected]>
Cc: Chris Metcalf <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Ni zhan Chen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In flush_all_zero_pkmaps(), we have an index of the pkmap associated with
the page. Using this index, we can simply get virtual address of the
page. So change it.
Signed-off-by: Joonsoo Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We can find free page_address_map instance without the page_address_pool.
So remove it.
Signed-off-by: Joonsoo Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The pool_lock protects the page_address_pool from concurrent access. But,
access to the page_address_pool is already protected by kmap_lock. So
remove it.
Signed-off-by: Joonsoo Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Reviewed-by: Minchan Kin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
To calculate an index of pkmap, using PKMAP_NR() is more understandable
and maintainable, so change it.
Signed-off-by: Joonsoo Kim <[email protected]>
Cc: Mel Gorman <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The call to frontswap_init() was added within enable_swap_info(), which
was called not only during sys_swapon, but also to reinsert the swap_info
into the swap_list in case of failure of try_to_unuse() within
sys_swapoff. This means that frontswap_init() might be called more than
once for the same swap area.
While as far as I could see no frontswap implementation has any problem
with it (and in fact, all the ones I found ignore the parameter passed to
frontswap_init), this could change in the future.
To prevent future problems, move the call to frontswap_init() to outside
the code shared between sys_swapon and sys_swapoff.
Signed-off-by: Cesar Eduardo Barros <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Acked-by: Dan Magenheimer <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The block within sys_swapoff() which re-inserts the swap_info into the
swap_list in case of failure of try_to_unuse() reads a few values outside
the swap_lock. While this is safe at that point, it is subtle code.
Simplify the code by moving the reading of these values to a separate
function, refactoring it a bit so they are read from within the swap_lock.
This is easier to understand, and matches better the way it worked before
I unified the insertion of the swap_info from both sys_swapon and
sys_swapoff.
This change should make no functional difference. The only real change is
moving the read of two or three structure fields to within the lock
(frontswap_map_get() is nothing more than a read of p->frontswap_map).
Signed-off-by: Cesar Eduardo Barros <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Dan Magenheimer <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
If we have more inactive file pages than active file pages, we skip
scanning the active file pages altogether, with the idea that we do not
want to evict the working set when there is plenty of streaming IO in the
cache.
However, the code forgot to also skip scanning anonymous pages in that
situation. That leads to the curious situation of keeping the active file
pages protected from being paged out when there are lots of inactive file
pages, while still scanning and evicting anonymous pages.
This patch fixes that situation, by only evicting file pages when we have
plenty of them and most are inactive.
[[email protected]: adjust comment layout]
Signed-off-by: Rik van Riel <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|