Age | Commit message (Collapse) | Author | Files | Lines |
|
This function is only called during initialization.
Signed-off-by: Luiz Capitulino <[email protected]>
Cc: Andi Kleen <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
No reason to duplicate the code of an existing macro.
Signed-off-by: Luiz Capitulino <[email protected]>
Cc: Andi Kleen <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The i_mmap_mutex is a close cousin of the anon vma lock, both protecting
similar data, one for file backed pages and the other for anon memory. To
this end, this lock can also be a rwsem. In addition, there are some
important opportunities to share the lock when there are no tree
modifications.
This conversion is straightforward. For now, all users take the write
lock.
[[email protected]: update fremap.c]
Signed-off-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Acked-by: "Kirill A. Shutemov" <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Convert all open coded mutex_lock/unlock calls to the
i_mmap_[lock/unlock]_write() helpers.
Signed-off-by: Davidlohr Bueso <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Acked-by: "Kirill A. Shutemov" <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup update from Tejun Heo:
"cpuset got simplified a bit. cgroup core got a fix on unified
hierarchy and grew some effective css related interfaces which will be
used for blkio support for writeback IO traffic which is currently
being worked on"
* 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: implement cgroup_get_e_css()
cgroup: add cgroup_subsys->css_e_css_changed()
cgroup: add cgroup_subsys->css_released()
cgroup: fix the async css offline wait logic in cgroup_subtree_control_write()
cgroup: restructure child_subsys_mask handling in cgroup_subtree_control_write()
cgroup: separate out cgroup_calc_child_subsys_mask() from cgroup_refresh_child_subsys_mask()
cpuset: lock vs unlock typo
cpuset: simplify cpuset_node_allowed API
cpuset: convert callback_mutex to a spinlock
|
|
First, after flushing TLB, we have no need to scan pte from start again.
Second, before bail out loop, the address is forwarded one step.
Signed-off-by: Hillf Danton <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Acked-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Current cpuset API for checking if a zone/node is allowed to allocate
from looks rather awkward. We have hardwall and softwall versions of
cpuset_node_allowed with the softwall version doing literally the same
as the hardwall version if __GFP_HARDWALL is passed to it in gfp flags.
If it isn't, the softwall version may check the given node against the
enclosing hardwall cpuset, which it needs to take the callback lock to
do.
Such a distinction was introduced by commit 02a0e53d8227 ("cpuset:
rework cpuset_zone_allowed api"). Before, we had the only version with
the __GFP_HARDWALL flag determining its behavior. The purpose of the
commit was to avoid sleep-in-atomic bugs when someone would mistakenly
call the function without the __GFP_HARDWALL flag for an atomic
allocation. The suffixes introduced were intended to make the callers
think before using the function.
However, since the callback lock was converted from mutex to spinlock by
the previous patch, the softwall check function cannot sleep, and these
precautions are no longer necessary.
So let's simplify the API back to the single check.
Suggested-by: David Rientjes <[email protected]>
Signed-off-by: Vladimir Davydov <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Acked-by: Zefan Li <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
Trivially convert a few VM_BUG_ON calls to VM_BUG_ON_VMA to extract
more information when they trigger.
[[email protected]: coding-style fixes]
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It is possible for some platforms, such as powerpc to set HPAGE_SHIFT to
0 to indicate huge pages not supported.
When this is the case, hugetlbfs could be disabled during boot time:
hugetlbfs: disabling because there are no supported hugepage sizes
Then in dissolve_free_huge_pages(), order is kept maximum (64 for
64bits), and the for loop below won't end: for (pfn = start_pfn; pfn <
end_pfn; pfn += 1 << order)
As suggested by Naoya, below fix checks hugepages_supported() before
calling dissolve_free_huge_pages().
[[email protected]: no legitimate reason to call dissolve_free_huge_pages() when !hugepages_supported()]
Signed-off-by: Li Zhong <[email protected]>
Acked-by: Naoya Horiguchi <[email protected]>
Acked-by: David Rientjes <[email protected]>
Signed-off-by: David Rientjes <[email protected]>
Cc: <[email protected]> [3.12+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
They are unnecessary: "zero" can be used in place of "hugetlb_zero" and
passing extra2 == NULL is equivalent to infinity.
Signed-off-by: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Luiz Capitulino <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Three different interfaces alter the maximum number of hugepages for an
hstate:
- /proc/sys/vm/nr_hugepages for global number of hugepages of the default
hstate,
- /sys/kernel/mm/hugepages/hugepages-X/nr_hugepages for global number of
hugepages for a specific hstate, and
- /sys/kernel/mm/hugepages/hugepages-X/nr_hugepages/mempolicy for number of
hugepages for a specific hstate over the set of allowed nodes.
Generalize the code so that a single function handles all of these
writes instead of duplicating the code in two different functions.
This decreases the number of lines of code, but also reduces the size of
.text by about half a percent since set_max_huge_pages() can be inlined.
Signed-off-by: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Luiz Capitulino <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When returning from hugetlb_cow(), we always (1) put back the refcount
for each referenced page -- always 'old', and 'new' if allocation was
successful. And (2) retake the page table lock right before returning,
as the callers expects. This logic can be simplified and encapsulated,
as proposed in this patch. In addition to cleaner code, we also shave a
few bytes off the instruction text:
text data bss dec hex filename
28399 462 41328 70189 1122d mm/hugetlb.o-baseline
28367 462 41328 70157 1120d mm/hugetlb.o-patched
Passes libhugetlbfs testcases.
Signed-off-by: Davidlohr Bueso <[email protected]>
Cc: Aswin Chandramouleeswaran <[email protected]>
Acked-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This function always returns 1, thus no need to check return value in
hugetlb_cow(). By doing so, we can get rid of the unnecessary WARN_ON
call. While this logic perhaps existed as a way of identifying future
unmap_ref_private() mishandling, reality is it serves no apparent
purpose.
Signed-off-by: Davidlohr Bueso <[email protected]>
Cc: Aswin Chandramouleeswaran <[email protected]>
Acked-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
PG_head_mask was added into VMCOREINFO to filter huge pages in b3acc56bfe1
("kexec: save PG_head_mask in VMCOREINFO"), but makedumpfile still need
another symbol to filter *hugetlbfs* pages.
If a user hope to filter user pages, makedumpfile tries to exclude them by
checking the condition whether the page is anonymous, but hugetlbfs pages
aren't anonymous while they also be user pages.
We know it's possible to detect them in the same way as PageHuge(),
so we need the start address of free_huge_page():
int PageHuge(struct page *page)
{
if (!PageCompound(page))
return 0;
page = compound_head(page);
return get_compound_page_dtor(page) == free_huge_page;
}
For that reason, this patch changes free_huge_page() into public
to export it to VMCOREINFO.
Signed-off-by: Atsushi Kumagai <[email protected]>
Acked-by: Baoquan He <[email protected]>
Cc: Vivek Goyal <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 4a705fef9862 ("hugetlb: fix copy_hugetlb_page_range() to handle
migration/hwpoisoned entry") changed the order of
huge_ptep_set_wrprotect() and huge_ptep_get(), which leads to breakage
in some workloads like hugepage-backed heap allocation via libhugetlbfs.
This patch fixes it.
The test program for the problem is shown below:
$ cat heap.c
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#define HPS 0x200000
int main() {
int i;
char *p = malloc(HPS);
memset(p, '1', HPS);
for (i = 0; i < 5; i++) {
if (!fork()) {
memset(p, '2', HPS);
p = malloc(HPS);
memset(p, '3', HPS);
free(p);
return 0;
}
}
sleep(1);
free(p);
return 0;
}
$ export HUGETLB_MORECORE=yes ; export HUGETLB_NO_PREFAULT= ; hugectl --heap ./heap
Fixes 4a705fef9862 ("hugetlb: fix copy_hugetlb_page_range() to handle
migration/hwpoisoned entry"), so is applicable to -stable kernels which
include it.
Signed-off-by: Naoya Horiguchi <[email protected]>
Reported-by: Guillaume Morin <[email protected]>
Suggested-by: Guillaume Morin <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: <[email protected]> [2.6.37+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There's a race between fork() and hugepage migration, as a result we try
to "dereference" a swap entry as a normal pte, causing kernel panic.
The cause of the problem is that copy_hugetlb_page_range() can't handle
"swap entry" family (migration entry and hwpoisoned entry) so let's fix
it.
[[email protected]: coding-style fixes]
Signed-off-by: Naoya Horiguchi <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: <[email protected]> [2.6.37+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We already have a function named hugepages_supported(), and the similar
name hugepage_migration_support() is a bit unconfortable, so let's rename
it hugepage_migration_supported().
Signed-off-by: Naoya Horiguchi <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
alloc_huge_page() now mixes normal code path with error handle logic.
This patches move out the error handle logic, to make normal code path
more clean and redue code duplicate.
Signed-off-by: Jianyu Zhan <[email protected]>
Acked-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
HugeTLB is limited to allocating hugepages whose size are less than
MAX_ORDER order. This is so because HugeTLB allocates hugepages via the
buddy allocator. Gigantic pages (that is, pages whose size is greater
than MAX_ORDER order) have to be allocated at boottime.
However, boottime allocation has at least two serious problems. First,
it doesn't support NUMA and second, gigantic pages allocated at boottime
can't be freed.
This commit solves both issues by adding support for allocating gigantic
pages during runtime. It works just like regular sized hugepages,
meaning that the interface in sysfs is the same, it supports NUMA, and
gigantic pages can be freed.
For example, on x86_64 gigantic pages are 1GB big. To allocate two 1G
gigantic pages on node 1, one can do:
# echo 2 > \
/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
And to free them all:
# echo 0 > \
/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
The one problem with gigantic page allocation at runtime is that it
can't be serviced by the buddy allocator. To overcome that problem,
this commit scans all zones from a node looking for a large enough
contiguous region. When one is found, it's allocated by using CMA, that
is, we call alloc_contig_range() to do the actual allocation. For
example, on x86_64 we scan all zones looking for a 1GB contiguous
region. When one is found, it's allocated by alloc_contig_range().
One expected issue with that approach is that such gigantic contiguous
regions tend to vanish as runtime goes by. The best way to avoid this
for now is to make gigantic page allocations very early during system
boot, say from a init script. Other possible optimization include using
compaction, which is supported by CMA but is not explicitly used by this
commit.
It's also important to note the following:
1. Gigantic pages allocated at boottime by the hugepages= command-line
option can be freed at runtime just fine
2. This commit adds support for gigantic pages only to x86_64. The
reason is that I don't have access to nor experience with other archs.
The code is arch indepedent though, so it should be simple to add
support to different archs
3. I didn't add support for hugepage overcommit, that is allocating
a gigantic page on demand when
/proc/sys/vm/nr_overcommit_hugepages > 0. The reason is that I don't
think it's reasonable to do the hard and long work required for
allocating a gigantic page at fault time. But it should be simple
to add this if wanted
[[email protected]: coding-style fixes]
Signed-off-by: Luiz Capitulino <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Zhang Yanfei <[email protected]>
Reviewed-by: Yasuaki Ishimatsu <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Yinghai Lu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Next commit will add new code which will want to call
for_each_node_mask_to_alloc() macro. Move it, its buddy
for_each_node_mask_to_free() and their dependencies up in the file so the
new code can use them. This is just code movement, no logic change.
Signed-off-by: Luiz Capitulino <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Yasuaki Ishimatsu <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Zhang Yanfei <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Yinghai Lu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Hugepages pages never get the PG_reserved bit set, so don't clear it.
However, note that if the bit gets mistakenly set free_pages_check() will
catch it.
Signed-off-by: Luiz Capitulino <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Zhang Yanfei <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Yinghai Lu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Luiz Capitulino <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Yasuaki Ishimatsu <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Zhang Yanfei <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Yinghai Lu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The HugeTLB subsystem uses the buddy allocator to allocate hugepages
during runtime. This means that hugepages allocation during runtime is
limited to MAX_ORDER order. For archs supporting gigantic pages (that
is, page sizes greater than MAX_ORDER), this in turn means that those
pages can't be allocated at runtime.
HugeTLB supports gigantic page allocation during boottime, via the boot
allocator. To this end the kernel provides the command-line options
hugepagesz= and hugepages=, which can be used to instruct the kernel to
allocate N gigantic pages during boot.
For example, x86_64 supports 2M and 1G hugepages, but only 2M hugepages
can be allocated and freed at runtime. If one wants to allocate 1G
gigantic pages, this has to be done at boot via the hugepagesz= and
hugepages= command-line options.
Now, gigantic page allocation at boottime has two serious problems:
1. Boottime allocation is not NUMA aware. On a NUMA machine the kernel
evenly distributes boottime allocated hugepages among nodes.
For example, suppose you have a four-node NUMA machine and want
to allocate four 1G gigantic pages at boottime. The kernel will
allocate one gigantic page per node.
On the other hand, we do have users who want to be able to specify
which NUMA node gigantic pages should allocated from. So that they
can place virtual machines on a specific NUMA node.
2. Gigantic pages allocated at boottime can't be freed
At this point it's important to observe that regular hugepages allocated
at runtime don't have those problems. This is so because HugeTLB
interface for runtime allocation in sysfs supports NUMA and runtime
allocated pages can be freed just fine via the buddy allocator.
This series adds support for allocating gigantic pages at runtime. It
does so by allocating gigantic pages via CMA instead of the buddy
allocator. Releasing gigantic pages is also supported via CMA. As this
series builds on top of the existing HugeTLB interface, it makes gigantic
page allocation and releasing just like regular sized hugepages. This
also means that NUMA support just works.
For example, to allocate two 1G gigantic pages on node 1, one can do:
# echo 2 > \
/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
And, to release all gigantic pages on the same node:
# echo 0 > \
/sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
Please, refer to patch 5/5 for full technical details.
Finally, please note that this series is a follow up for a previous series
that tried to extend the command-line options set to be NUMA aware:
http://marc.info/?l=linux-mm&m=139593335312191&w=2
During the discussion of that series it was agreed that having runtime
allocation support for gigantic pages was a better solution.
This patch (of 5):
This function is going to be used by non-init code in a future
commit.
Signed-off-by: Luiz Capitulino <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: Zhang Yanfei <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Yasuaki Ishimatsu <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently, I am seeing the following when I `mount -t hugetlbfs /none
/dev/hugetlbfs`, and then simply do a `ls /dev/hugetlbfs`. I think it's
related to the fact that hugetlbfs is properly not correctly setting
itself up in this state?:
Unable to handle kernel paging request for data at address 0x00000031
Faulting instruction address: 0xc000000000245710
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2048 NUMA pSeries
....
In KVM guests on Power, in a guest not backed by hugepages, we see the
following:
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 64 kB
HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages
are not supported at boot-time, but this is only checked in
hugetlb_init(). Extract the check to a helper function, and use it in a
few relevant places.
This does make hugetlbfs not supported (not registered at all) in this
environment. I believe this is fine, as there are no valid hugepages
and that won't change at runtime.
[[email protected]: use pr_info(), per Mel]
[[email protected]: fix build when HPAGE_SHIFT is undefined]
Signed-off-by: Nishanth Aravamudan <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
soft lockup in freeing gigantic hugepage fixed in commit 55f67141a892 "mm:
hugetlb: fix softlockup when a large number of hugepages are freed." can
happen in return_unused_surplus_pages(), so let's fix it.
Signed-off-by: Masayoshi Mizuma <[email protected]>
Signed-off-by: Naoya Horiguchi <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Aneesh Kumar <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When I decrease the value of nr_hugepage in procfs a lot, softlockup
happens. It is because there is no chance of context switch during this
process.
On the other hand, when I allocate a large number of hugepages, there is
some chance of context switch. Hence softlockup doesn't happen during
this process. So it's necessary to add the context switch in the
freeing process as same as allocating process to avoid softlockup.
When I freed 12 TB hugapages with kernel-2.6.32-358.el6, the freeing
process occupied a CPU over 150 seconds and following softlockup message
appeared twice or more.
$ echo 6000000 > /proc/sys/vm/nr_hugepages
$ cat /proc/sys/vm/nr_hugepages
6000000
$ grep ^Huge /proc/meminfo
HugePages_Total: 6000000
HugePages_Free: 6000000
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
$ echo 0 > /proc/sys/vm/nr_hugepages
BUG: soft lockup - CPU#16 stuck for 67s! [sh:12883] ...
Pid: 12883, comm: sh Not tainted 2.6.32-358.el6.x86_64 #1
Call Trace:
free_pool_huge_page+0xb8/0xd0
set_max_huge_pages+0x128/0x190
hugetlb_sysctl_handler_common+0x113/0x140
hugetlb_sysctl_handler+0x1e/0x20
proc_sys_call_handler+0x97/0xd0
proc_sys_write+0x14/0x20
vfs_write+0xb8/0x1a0
sys_write+0x51/0x90
__audit_syscall_exit+0x265/0x290
system_call_fastpath+0x16/0x1b
I have not confirmed this problem with upstream kernels because I am not
able to prepare the machine equipped with 12TB memory now. However I
confirmed that the amount of decreasing hugepages was directly
proportional to the amount of required time.
I measured required times on a smaller machine. It showed 130-145
hugepages decreased in a millisecond.
Amount of decreasing Required time Decreasing rate
hugepages (msec) (pages/msec)
------------------------------------------------------------
10,000 pages == 20GB 70 - 74 135-142
30,000 pages == 60GB 208 - 229 131-144
It means decrement of 6TB hugepages will trigger softlockup with the
default threshold 20sec, in this decreasing rate.
Signed-off-by: Masayoshi Mizuma <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Aneesh Kumar <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Choi Gi-yong <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
To increase compiler portability there is <linux/compiler.h> which
provides convenience macros for various gcc constructs. Eg: __weak for
__attribute__((weak)). I've replaced all instances of gcc attributes with
the right macro in the memory management (/mm) subsystem.
[[email protected]: while-we're-there consistency tweaks]
Signed-off-by: Gideon Israel Dsouza <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The NUMA scanning code can end up iterating over many gigabytes of
unpopulated memory, especially in the case of a freshly started KVM
guest with lots of memory.
This results in the mmu notifier code being called even when there are
no mapped pages in a virtual address range. The amount of time wasted
can be enough to trigger soft lockup warnings with very large KVM
guests.
This patch moves the mmu notifier call to the pmd level, which
represents 1GB areas of memory on x86-64. Furthermore, the mmu notifier
code is only called from the address in the PMD where present mappings
are first encountered.
The hugetlbfs code is left alone for now; hugetlb mappings are not
relocatable, and as such are left alone by the NUMA code, and should
never trigger this problem to begin with.
Signed-off-by: Rik van Riel <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Reported-by: Xing Gang <[email protected]>
Tested-by: Chegu Vinod <[email protected]>
Cc: Sasha Levin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
huge_pte_offset() could return NULL, so we need NULL check to avoid
potential NULL pointer dereferences.
Signed-off-by: Naoya Horiguchi <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Both prep_compound_huge_page() and prep_compound_gigantic_page() are
only called at bootstrap and can be marked as __init.
The __SetPageTail(page) in prep_compound_gigantic_page() happening
before page->first_page is initialized is not concerning since this is
bootstrap.
Signed-off-by: David Rientjes <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Reviewed-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The kernel can currently only handle a single hugetlb page fault at a
time. This is due to a single mutex that serializes the entire path.
This lock protects from spurious OOM errors under conditions of low
availability of free hugepages. This problem is specific to hugepages,
because it is normal to want to use every single hugepage in the system
- with normal pages we simply assume there will always be a few spare
pages which can be used temporarily until the race is resolved.
Address this problem by using a table of mutexes, allowing a better
chance of parallelization, where each hugepage is individually
serialized. The hash key is selected depending on the mapping type.
For shared ones it consists of the address space and file offset being
faulted; while for private ones the mm and virtual address are used.
The size of the table is selected based on a compromise of collisions
and memory footprint of a series of database workloads.
Large database workloads that make heavy use of hugepages can be
particularly exposed to this issue, causing start-up times to be
painfully slow. This patch reduces the startup time of a 10 Gb Oracle
DB (with ~5000 faults) from 37.5 secs to 25.7 secs. Larger workloads
will naturally benefit even more.
NOTE:
The only downside to this patch, detected by Joonsoo Kim, is that a
small race is possible in private mappings: A child process (with its
own mm, after cow) can instantiate a page that is already being handled
by the parent in a cow fault. When low on pages, can trigger spurious
OOMs. I have not been able to think of a efficient way of handling
this... but do we really care about such a tiny window? We already
maintain another theoretical race with normal pages. If not, one
possible way to is to maintain the single hash for private mappings --
any workloads that *really* suffer from this scaling problem should
already use shared mappings.
[[email protected]: remove stray + characters, go BUG if hugetlb_init() kmalloc fails]
Signed-off-by: Davidlohr Bueso <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: David Gibson <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Util now, we get a resv_map by two ways according to each mapping type.
This makes code dirty and unreadable. Unify it.
[[email protected]: code cleanups]
Signed-off-by: Joonsoo Kim <[email protected]>
Signed-off-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Cc: David Gibson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This is a preparation patch to unify the use of vma_resv_map()
regardless of the map type. This patch prepares it by removing
resv_map_put(), which only works for HPAGE_RESV_OWNER's resv_map, not
for all resv_maps.
[[email protected]: update changelog]
Signed-off-by: Joonsoo Kim <[email protected]>
Signed-off-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Cc: David Gibson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There is a race condition if we map a same file on different processes.
Region tracking is protected by mmap_sem and hugetlb_instantiation_mutex.
When we do mmap, we don't grab a hugetlb_instantiation_mutex, but only
mmap_sem (exclusively). This doesn't prevent other tasks from modifying
the region structure, so it can be modified by two processes
concurrently.
To solve this, introduce a spinlock to resv_map and make region
manipulation function grab it before they do actual work.
[[email protected]: updated changelog]
Signed-off-by: Davidlohr Bueso <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
Suggested-by: Joonsoo Kim <[email protected]>
Acked-by: David Gibson <[email protected]>
Cc: David Gibson <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
To change a protection method for region tracking to find grained one,
we pass the resv_map, instead of list_head, to region manipulation
functions.
This doesn't introduce any functional change, and it is just for
preparing a next step.
[[email protected]: update changelog]
Signed-off-by: Joonsoo Kim <[email protected]>
Signed-off-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Cc: David Gibson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently, to track reserved and allocated regions, we use two different
ways, depending on the mapping. For MAP_SHARED, we use
address_mapping's private_list and, while for MAP_PRIVATE, we use a
resv_map.
Now, we are preparing to change a coarse grained lock which protect a
region structure to fine grained lock, and this difference hinder it.
So, before changing it, unify region structure handling, consistently
using a resv_map regardless of the kind of mapping.
Signed-off-by: Joonsoo Kim <[email protected]>
Signed-off-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Cc: David Gibson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since put_mems_allowed() is strictly optional, its a seqcount retry, we
don't need to evaluate the function if the allocation was in fact
successful, saving a smp_rmb some loads and comparisons on some relative
fast-paths.
Since the naming, get/put_mems_allowed() does suggest a mandatory
pairing, rename the interface, as suggested by Mel, to resemble the
seqcount interface.
This gives us: read_mems_allowed_begin() and read_mems_allowed_retry(),
where it is important to note that the return value of the latter call
is inverted from its previous incarnation.
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Most of the VM_BUG_ON assertions are performed on a page. Usually, when
one of these assertions fails we'll get a BUG_ON with a call stack and
the registers.
I've recently noticed based on the requests to add a small piece of code
that dumps the page to various VM_BUG_ON sites that the page dump is
quite useful to people debugging issues in mm.
This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what
VM_BUG_ON() does, also dumps the page before executing the actual
BUG_ON.
[[email protected]: fix up includes]
Signed-off-by: Sasha Levin <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Switch to memblock interfaces for early memory allocator instead of
bootmem allocator. No functional change in beahvior than what it is in
current code from bootmem users points of view.
Archs already converted to NO_BOOTMEM now directly use memblock
interfaces instead of bootmem wrappers build on top of memblock. And
the archs which still uses bootmem, these new apis just fallback to
exiting bootmem APIs.
Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Santosh Shilimkar <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Russell King <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Tony Lindgren <[email protected]>
Cc: Yinghai Lu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When copy_hugetlb_page_range() is called to copy a range of hugetlb
mappings, the secondary MMUs are not notified if there is a protection
downgrade, which breaks COW semantics in KVM.
This patch adds the necessary MMU notifier calls.
Signed-off-by: Andreas Sandberg <[email protected]>
Acked-by: Steve Capper <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
No actual need of it. So keep it internal.
Signed-off-by: Andrea Arcangeli <[email protected]>
Cc: Khalid Aziz <[email protected]>
Cc: Pravin Shelar <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ben Hutchings <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Andrea Arcangeli <[email protected]>
Cc: Khalid Aziz <[email protected]>
Cc: Pravin Shelar <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ben Hutchings <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
get_page_foll() is more optimal and is always safe to use under the PT
lock. More so for hugetlbfs as there's no risk of race conditions with
split_huge_page regardless of the PT lock.
Signed-off-by: Andrea Arcangeli <[email protected]>
Tested-by: Khalid Aziz <[email protected]>
Cc: Pravin Shelar <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ben Hutchings <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add calls to the new mmu_notifier_invalidate_range() function to all
places in the VMM that need it.
Signed-off-by: Joerg Roedel <[email protected]>
Reviewed-by: Andrea Arcangeli <[email protected]>
Reviewed-by: Jérôme Glisse <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Jay Cornwall <[email protected]>
Cc: Oded Gabbay <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Jesse Barnes <[email protected]>
Cc: David Woodhouse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Oded Gabbay <[email protected]>
|
|
Commit 7cb2ef56e6a8 ("mm: fix aio performance regression for database
caused by THP") can cause dereference of a dangling pointer if
split_huge_page runs during PageHuge() if there are updates to the
tail_page->private field.
Also it is repeating compound_head twice for hugetlbfs and it is running
compound_head+compound_trans_head for THP when a single one is needed in
both cases.
The new code within the PageSlab() check doesn't need to verify that the
THP page size is never bigger than the smallest hugetlbfs page size, to
avoid memory corruption.
A longstanding theoretical race condition was found while fixing the
above (see the change right after the skip_unlock label, that is
relevant for the compound_lock path too).
By re-establishing the _mapcount tail refcounting for all compound
pages, this also fixes the below problem:
echo 0 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
BUG: Bad page state in process bash pfn:59a01
page:ffffea000139b038 count:0 mapcount:10 mapping: (null) index:0x0
page flags: 0x1c00000000008000(tail)
Modules linked in:
CPU: 6 PID: 2018 Comm: bash Not tainted 3.12.0+ #25
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
dump_stack+0x55/0x76
bad_page+0xd5/0x130
free_pages_prepare+0x213/0x280
__free_pages+0x36/0x80
update_and_free_page+0xc1/0xd0
free_pool_huge_page+0xc2/0xe0
set_max_huge_pages.part.58+0x14c/0x220
nr_hugepages_store_common.isra.60+0xd0/0xf0
nr_hugepages_store+0x13/0x20
kobj_attr_store+0xf/0x20
sysfs_write_file+0x189/0x1e0
vfs_write+0xc5/0x1f0
SyS_write+0x55/0xb0
system_call_fastpath+0x16/0x1b
Signed-off-by: Khalid Aziz <[email protected]>
Signed-off-by: Andrea Arcangeli <[email protected]>
Tested-by: Khalid Aziz <[email protected]>
Cc: Pravin Shelar <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Ben Hutchings <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Right now, the migration code in migrate_page_copy() uses copy_huge_page()
for hugetlbfs and thp pages:
if (PageHuge(page) || PageTransHuge(page))
copy_huge_page(newpage, page);
So, yay for code reuse. But:
void copy_huge_page(struct page *dst, struct page *src)
{
struct hstate *h = page_hstate(src);
and a non-hugetlbfs page has no page_hstate(). This works 99% of the
time because page_hstate() determines the hstate from the page order
alone. Since the page order of a THP page matches the default hugetlbfs
page order, it works.
But, if you change the default huge page size on the boot command-line
(say default_hugepagesz=1G), then we might not even *have* a 2MB hstate
so page_hstate() returns null and copy_huge_page() oopses pretty fast
since copy_huge_page() dereferences the hstate:
void copy_huge_page(struct page *dst, struct page *src)
{
struct hstate *h = page_hstate(src);
if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
...
Mel noticed that the migration code is really the only user of these
functions. This moves all the copy code over to migrate.c and makes
copy_huge_page() work for THP by checking for it explicitly.
I believe the bug was introduced in commit b32967ff101a ("mm: numa: Add
THP migration for the NUMA working set scanning fault case")
[[email protected]: fix coding-style and comment text, per Naoya Horiguchi]
Signed-off-by: Dave Hansen <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Tested-by: Dave Jiang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Hugetlb supports multiple page sizes. We use split lock only for PMD
level, but not for PUD.
[[email protected]: coding-style fixes]
Signed-off-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Tested-by: Alex Thorlton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: David Howells <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 11feeb498086 ("kvm: optimize away THP checks in
kvm_is_mmio_pfn()") introduced a memory leak when KVM is run on gigantic
compound pages.
That commit depends on the assumption that PG_reserved is identical for
all head and tail pages of a compound page. So that if get_user_pages
returns a tail page, we don't need to check the head page in order to
know if we deal with a reserved page that requires different
refcounting.
The assumption that PG_reserved is the same for head and tail pages is
certainly correct for THP and regular hugepages, but gigantic hugepages
allocated through bootmem don't clear the PG_reserved on the tail pages
(the clearing of PG_reserved is done later only if the gigantic hugepage
is freed).
This patch corrects the gigantic compound page initialization so that we
can retain the optimization in 11feeb498086. The cacheline was already
modified in order to set PG_tail so this won't affect the boot time of
large memory systems.
[[email protected]: tweak comment layout and grammar]
Signed-off-by: Andrea Arcangeli <[email protected]>
Reported-by: andy123 <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Cc: Gleb Natapov <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Hugh Dickins <[email protected]>
Acked-by: Rafael Aquini <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We should clear the page's private flag when returing the page to the
hugepage pool. Otherwise, marked hugepage can be allocated to the user
who tries to allocate the non-reserved hugepage. If this user fail to
map this hugepage, he would try to return the page to the hugepage pool.
Since this page has a private flag, resv_huge_pages would mistakenly
increase. This patch fixes this situation.
Signed-off-by: Joonsoo Kim <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: David Gibson <[email protected]>
Cc: Wanpeng Li <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Hillf Danton <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|