Age | Commit message (Collapse) | Author | Files | Lines |
|
cpumask_t is very big struct and cpu_vm_mask is placed wrong position.
It might lead to reduce cache hit ratio.
This patch has two change.
1) Move the place of cpumask into last of mm_struct. Because usually cpumask
is accessed only front bits when the system has cpu-hotplug capability
2) Convert cpu_vm_mask into cpumask_var_t. It may help to reduce memory
footprint if cpumask_size() will use nr_cpumask_bits properly in future.
In addition, this patch change the name of cpu_vm_mask with cpu_vm_mask_var.
It may help to detect out of tree cpu_vm_mask users.
This patch has no functional change.
[[email protected]: build fix]
[[email protected]: coding-style fixes]
Signed-off-by: KOSAKI Motohiro <[email protected]>
Cc: David Howells <[email protected]>
Cc: Koichi Yasutake <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Chris Metcalf <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We don't need to hold the mmmap_sem through mem_cgroup_newpage_charge(),
the mmap_sem is only hold for keeping the vma stable and we don't need the
vma stable anymore after we return from alloc_hugepage_vma().
Signed-off-by: Andrea Arcangeli <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Some of these functions have grown beyond inline sanity, move them
out-of-line.
Signed-off-by: Peter Zijlstra <[email protected]>
Requested-by: Andrew Morton <[email protected]>
Requested-by: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Optimize the page_lock_anon_vma() fast path to be one atomic op, instead
of two.
Signed-off-by: Peter Zijlstra <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Straightforward conversion of anon_vma->lock to a mutex.
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Convert page_lock_anon_vma() over to use refcounts. This is done to
prepare for the conversion of anon_vma from spinlock to mutex.
Sadly this inceases the cost of page_lock_anon_vma() from one to two
atomics, a follow up patch addresses this, lets keep that simple for now.
Signed-off-by: Peter Zijlstra <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
A slightly more verbose comment to go along with the trickery in
page_lock_anon_vma().
Signed-off-by: Peter Zijlstra <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Its beyond ugly and gets in the way.
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Straightforward conversion of i_mmap_lock to a mutex.
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Hugh says:
"The only significant loser, I think, would be page reclaim (when
concurrent with truncation): could spin for a long time waiting for
the i_mmap_mutex it expects would soon be dropped? "
Counter points:
- cpu contention makes the spin stop (need_resched())
- zap pages should be freeing pages at a higher rate than reclaim
ever can
I think the simplification of the truncate code is definitely worth it.
Effectively reverts: 2aa15890f3c ("mm: prevent concurrent
unmap_mapping_range() on the same inode") and takes out the code that
caused its problem.
Signed-off-by: Peter Zijlstra <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In order to convert i_mmap_lock to a mutex we need a mutex equivalent to
spin_lock_nest_lock(), thus provide the mutex_lock_nest_lock() annotation.
As with spin_lock_nest_lock(), mutex_lock_nest_lock() allows annotation of
the locking pattern where an outer lock serializes the acquisition order
of nested locks. That is, if every time you lock multiple locks A, say A1
and A2 you first acquire N, the order of acquiring A1 and A2 is
irrelevant.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Instead of using a single batch (the small on-stack, or an allocated
page), try and extend the batch every time it runs out and only flush once
either the extend fails or we're done.
Signed-off-by: Peter Zijlstra <[email protected]>
Requested-by: Nick Piggin <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In case other architectures require RCU freed page-tables to implement
gup_fast() and software filled hashes and similar things, provide the
means to do so by moving the logic into generic code.
Signed-off-by: Peter Zijlstra <[email protected]>
Requested-by: David Miller <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fold all the mmu_gather rework patches into one for submission
Signed-off-by: Peter Zijlstra <[email protected]>
Reported-by: Hugh Dickins <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix up the um mmu_gather code to conform to the new API.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix up the ia64 mmu_gather code to conform to the new API.
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Tony Luck <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix up the sh mmu_gather code to conform to the new API.
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Paul Mundt <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix up the arm mmu_gather code to conform to the new API.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Adapt the stand-alone s390 mmu_gather implementation to the new
preemptible mmu_gather interface.
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Rework the sparc mmu_gather usage to conform to the new world order :-)
Sparc mmu_gather does two things:
- tracks vaddrs to unhash
- tracks pages to free
Split these two things like powerpc has done and keep the vaddrs
in per-cpu data structures and flush them on context switch.
The remaining bits can then use the generic mmu_gather.
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: David Miller <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fix up powerpc to the new mmu_gather stuff.
PPC has an extra batching queue to RCU free the actual pagetable
allocations, use the ARCH extentions for that for now.
For the ppc64_tlb_batch, which tracks the vaddrs to unhash from the
hardware hash-table, keep using per-cpu arrays but flush on context switch
and use a TLF bit to track the lazy_mmu state.
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Rework the existing mmu_gather infrastructure.
The direct purpose of these patches was to allow preemptible mmu_gather,
but even without that I think these patches provide an improvement to the
status quo.
The first 9 patches rework the mmu_gather infrastructure. For review
purpose I've split them into generic and per-arch patches with the last of
those a generic cleanup.
The next patch provides generic RCU page-table freeing, and the followup
is a patch converting s390 to use this. I've also got 4 patches from
DaveM lined up (not included in this series) that uses this to implement
gup_fast() for sparc64.
Then there is one patch that extends the generic mmu_gather batching.
After that follow the mm preemptibility patches, these make part of the mm
a lot more preemptible. It converts i_mmap_lock and anon_vma->lock to
mutexes which together with the mmu_gather rework makes mmu_gather
preemptible as well.
Making i_mmap_lock a mutex also enables a clean-up of the truncate code.
This also allows for preemptible mmu_notifiers, something that XPMEM I
think wants.
Furthermore, it removes the new and universially detested unmap_mutex.
This patch:
Remove the first obstacle towards a fully preemptible mmu_gather.
The current scheme assumes mmu_gather is always done with preemption
disabled and uses per-cpu storage for the page batches. Change this to
try and allocate a page for batching and in case of failure, use a small
on-stack array to make some progress.
Preemptible mmu_gather is desired in general and usable once i_mmap_lock
becomes a mutex. Doing it before the mutex conversion saves us from
having to rework the code by moving the mmu_gather bits inside the
pte_lock.
Also avoid flushing the tlb batches from under the pte lock, this is
useful even without the i_mmap_lock conversion as it significantly reduces
pte lock hold times.
[[email protected]: fix comment tpyo]
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: David Miller <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Russell King <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Tony Luck <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Namhyung Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently we have expand_upwards exported while expand_downwards is
accessible only via expand_stack or expand_stack_downwards.
check_stack_guard_page is a nice example of the asymmetry. It uses
expand_stack for VM_GROWSDOWN while expand_upwards is called for
VM_GROWSUP case.
Let's clean this up by exporting both functions and make those names
consistent. Let's use expand_{upwards,downwards} because expanding
doesn't always involve stack manipulation (an example is
ia64_do_page_fault which uses expand_upwards for registers backing store
expansion). expand_downwards has to be defined for both
CONFIG_STACK_GROWS{UP,DOWN} because get_arg_page calls the downwards
version in the early process initialization phase for growsup
configuration.
Signed-off-by: Michal Hocko <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: "Luck, Tony" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The vmap allocator is used to, among other things, allocate per-cpu vmap
blocks, where each vmap block is naturally aligned to its own size.
Obviously, leaving a guard page after each vmap area forbids packing vmap
blocks efficiently and can make the kernel run out of possible vmap blocks
long before overall vmap space is exhausted.
The new interface to map a user-supplied page array into linear vmalloc
space (vm_map_ram) insists on allocating from a vmap block (instead of
falling back to a custom area) when the area size is below a certain
threshold. With heavy users of this interface (e.g. XFS) and limited
vmalloc space on 32-bit, vmap block exhaustion is a real problem.
Remove the guard page from the core vmap allocator. vmalloc and the old
vmap interface enforce a guard page on their own at a higher level.
Note that without this patch, we had accidental guard pages after those
vm_map_ram areas that happened to be at the end of a vmap block, but not
between every area. This patch removes this accidental guard page only.
If we want guard pages after every vm_map_ram area, this should be done
separately. And just like with vmalloc and the old interface on a
different level, not in the core allocator.
Mel pointed out: "If necessary, the guard page could be reintroduced as a
debugging-only option (CONFIG_DEBUG_PAGEALLOC?). Otherwise it seems
reasonable."
Signed-off-by: Johannes Weiner <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Dave Chinner <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
VM_BUG_ON() if effectively a BUG_ON() undef #ifdef CONFIG_DEBUG_VM. That
is exactly what we have here now, and two different folks have suggested
doing it this way.
Signed-off-by: Dave Hansen <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Running sparse on page_alloc.c today, it errors out:
include/linux/gfp.h:254:17: error: bad constant expression
include/linux/gfp.h:254:17: error: cannot size expression
which is a line in gfp_zone():
BUILD_BUG_ON((GFP_ZONE_BAD >> bit) & 1);
That's really unfortunate, because it ends up hiding all of the other
legitimate sparse messages like this:
mm/page_alloc.c:5315:59: warning: incorrect type in argument 1 (different base types)
mm/page_alloc.c:5315:59: expected unsigned long [unsigned] [usertype] size
mm/page_alloc.c:5315:59: got restricted gfp_t [usertype] <noident>
...
Having sparse be able to catch these very oopsable bugs is a lot more
important than keeping a BUILD_BUG_ON(). Kill the BUILD_BUG_ON().
Compiles on x86_64 with and without CONFIG_DEBUG_VM=y. defconfig boots
fine for me.
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There's a kernel-wide shortage of per-process flags, so it's always
helpful to trim one when possible without incurring a significant penalty.
It's even more important when you're planning on adding a per- process
flag yourself, which I plan to do shortly for transparent hugepages.
PF_OOM_ORIGIN is used by ksm and swapoff to prefer current since it has a
tendency to allocate large amounts of memory and should be preferred for
killing over other tasks. We'd rather immediately kill the task making
the errant syscall rather than penalizing an innocent task.
This patch removes PF_OOM_ORIGIN since its behavior is equivalent to
setting the process's oom_score_adj to OOM_SCORE_ADJ_MAX.
The process's old oom_score_adj is stored and then set to
OOM_SCORE_ADJ_MAX during the time it used to have PF_OOM_ORIGIN. The old
value is then reinstated when the process should no longer be considered a
high priority for oom killing.
Signed-off-by: David Rientjes <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Izik Eidus <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
__GFP_NO_KSWAPD
It's uncertain this has been beneficial, so it's safer to undo it. All
other compaction users would still go in synchronous mode if a first
attempt at async compaction failed. Hopefully we don't need to force
special behavior for THP (which is the only __GFP_NO_KSWAPD user so far
and it's the easier to exercise and to be noticeable). This also make
__GFP_NO_KSWAPD return to its original strict semantics specific to bypass
kswapd, as THP allocations have khugepaged for the async THP
allocations/compactions.
Signed-off-by: Andrea Arcangeli <[email protected]>
Cc: Alex Villacis Lasso <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently, cpu hotplug updates pcp->stat_threshold, but memory hotplug
doesn't. There is no reason for this.
[[email protected]: fix CONFIG_SMP=n build]
Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently, memory hotplug calls setup_per_zone_wmarks() and
calculate_zone_inactive_ratio(), but doesn't call
setup_per_zone_lowmem_reserve().
It means the number of reserved pages aren't updated even if memory hot
plug occur. This patch fixes it.
Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
should be __meminit.
Commit bce7394a3e ("page-allocator: reset wmark_min and inactive ratio of
zone when hotplug happens") introduced invalid section references. Now,
setup_per_zone_inactive_ratio() is marked __init and then it can't be
referenced from memory hotplug code.
This patch marks it as __meminit and also marks caller as __ref.
Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Cc: Yasunori Goto <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Johannes Weiner <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When an oom killing occurs, almost all processes are getting stuck at the
following two points.
1) __alloc_pages_nodemask
2) __lock_page_or_retry
1) is not very problematic because TIF_MEMDIE leads to an allocation
failure and getting out from page allocator.
2) is more problematic. In an OOM situation, zones typically don't have
page cache at all and memory starvation might lead to greatly reduced IO
performance. When a fork bomb occurs, TIF_MEMDIE tasks don't die quickly,
meaning that a fork bomb may create new process quickly rather than the
oom-killer killing it. Then, the system may become livelocked.
This patch makes the pagefault interruptible by SIGKILL.
Signed-off-by: KOSAKI Motohiro <[email protected]>
Reviewed-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
commit 2687a356 ("Add lock_page_killable") introduced killable
lock_page(). Similarly this patch introdues killable
wait_on_page_locked().
Signed-off-by: KOSAKI Motohiro <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
commit 2ac390370a ("writeback: add
/sys/devices/system/node/<node>/vmstat") added vmstat entry. But
strangely it only show nr_written and nr_dirtied.
# cat /sys/devices/system/node/node20/vmstat
nr_written 0
nr_dirtied 0
Of course, It's not adequate. With this patch, the vmstat show all vm
stastics as /proc/vmstat.
# cat /sys/devices/system/node/node0/vmstat
nr_free_pages 899224
nr_inactive_anon 201
nr_active_anon 17380
nr_inactive_file 31572
nr_active_file 28277
nr_unevictable 0
nr_mlock 0
nr_anon_pages 17321
nr_mapped 8640
nr_file_pages 60107
nr_dirty 33
nr_writeback 0
nr_slab_reclaimable 6850
nr_slab_unreclaimable 7604
nr_page_table_pages 3105
nr_kernel_stack 175
nr_unstable 0
nr_bounce 0
nr_vmscan_write 0
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 260
nr_dirtied 1050
nr_written 938
numa_hit 962872
numa_miss 0
numa_foreign 0
numa_interleave 8617
numa_local 962872
numa_other 0
nr_anon_transparent_hugepages 0
[[email protected]: no externs in .c files]
Signed-off-by: KOSAKI Motohiro <[email protected]>
Cc: Michael Rubin <[email protected]>
Cc: Wu Fengguang <[email protected]>
Acked-by: David Rientjes <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Because 'ret' is declared as int, not unsigned long, no need to cast the
error contants into unsigned long. If you compile this code on a 64-bit
machine somehow, you'll see following warning:
CC mm/nommu.o
mm/nommu.c: In function `do_mmap_pgoff':
mm/nommu.c:1411: warning: overflow in implicit constant conversion
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Greg Ungerer <[email protected]>
Cc: David Howells <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
If f_op->read() fails and sysctl_nr_trim_pages > 1, there could be a
memory leak between @region->vm_end and @region->vm_top.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Greg Ungerer <[email protected]>
Cc: David Howells <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Now we have the sorted vma list, use it in do_munmap() to check that we
have an exact match.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Greg Ungerer <[email protected]>
Cc: David Howells <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Now we have the sorted vma list, use it in the find_vma[_exact]() rather
than doing linear search on the rb-tree.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Greg Ungerer <[email protected]>
Cc: David Howells <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since commit 297c5eee3724 ("mm: make the vma list be doubly linked") made
it a doubly linked list, we don't need to scan the list when deleting
@vma.
And the original code didn't update the prev pointer. Fix it too.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Greg Ungerer <[email protected]>
Cc: David Howells <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When I was reading nommu code, I found that it handles the vma list/tree
in an unusual way. IIUC, because there can be more than one
identical/overrapped vmas in the list/tree, it sorts the tree more
strictly and does a linear search on the tree. But it doesn't applied to
the list (i.e. the list could be constructed in a different order than
the tree so that we can't use the list when finding the first vma in that
order).
Since inserting/sorting a vma in the tree and link is done at the same
time, we can easily construct both of them in the same order. And linear
searching on the tree could be more costly than doing it on the list, it
can be converted to use the list.
Also, after the commit 297c5eee3724 ("mm: make the vma list be doubly
linked") made the list be doubly linked, there were a couple of code need
to be fixed to construct the list properly.
Patch 1/6 is a preparation. It maintains the list sorted same as the tree
and construct doubly-linked list properly. Patch 2/6 is a simple
optimization for the vma deletion. Patch 3/6 and 4/6 convert tree
traversal to list traversal and the rest are simple fixes and cleanups.
This patch:
@vma added into @mm should be sorted by start addr, end addr and VMA
struct addr in that order because we may get identical VMAs in the @mm.
However this was true only for the rbtree, not for the list.
This patch fixes this by remembering 'rb_prev' during the tree traversal
like find_vma_prepare() does and linking the @vma via __vma_link_list().
After this patch, we can iterate the whole VMAs in correct order simply by
using @mm->mmap list.
[[email protected]: avoid duplicating __vma_link_list()]
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Greg Ungerer <[email protected]>
Cc: David Howells <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Sergey Senozhatsky <[email protected]>
Reviewed-by: Christoph Lameter <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Avoid merging a VMA with another VMA which is cloned from the parent process.
The cloned VMA shares the anon_vma lock with the parent process's VMA. If
we do the merge, more vmas (even the new range is only for current
process) use the perent process's anon_vma lock. This introduces
scalability issues. find_mergeable_anon_vma() already considers this
case.
Signed-off-by: Shaohua Li <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
If we only change vma->vm_end, we can avoid taking anon_vma lock even if
'insert' isn't NULL, which is the case of split_vma.
As I understand it, we need the lock before because rmap must get the
'insert' VMA when we adjust old VMA's vm_end (the 'insert' VMA is linked
to anon_vma list in __insert_vm_struct before).
But now this isn't true any more. The 'insert' VMA is already linked to
anon_vma list in __split_vma(with anon_vma_clone()) instead of
__insert_vm_struct. There is no race rmap can't get required VMAs. So
the anon_vma lock is unnecessary, and this can reduce one locking in brk
case and improve scalability.
Signed-off-by: Shaohua Li<[email protected]>
Cc: Rik van Riel <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Make some variables have correct alignment/section to avoid cache issue.
In a workload which heavily does mmap/munmap, the variables will be used
frequently.
Signed-off-by: Shaohua Li <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Architectures that implement their own show_mem() function did not pass
the filter argument to show_free_areas() to appropriately avoid emitting
the state of nodes that are disallowed in the current context. This patch
now passes the filter argument to show_free_areas() so those nodes are now
avoided.
This patch also removes the show_free_areas() wrapper around
__show_free_areas() and converts existing callers to pass an empty filter.
ia64 emits additional information for each node, so skip_free_areas_zone()
must be made global to filter disallowed nodes and it is converted to use
a nid argument rather than a zone for this use case.
Signed-off-by: David Rientjes <[email protected]>
Cc: Russell King <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Kyle McMartin <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Guan Xuetao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This is not useful: it provides page->virtual and is used with highmem.
xtensa has no support for highmem and those HIGHMEM bits which are found
by grep are partly implemented. The interesting functions like kmap() are
missing. If someone actually implements the complete HIGHMEM support he
could use HASHED_PAGE_VIRTUAL like most others do.
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It is not referenced by any Makefile. pte_alloc_one_kernel() and
pte_alloc_one() is implemented in arch/xtensa/include/asm/pgalloc.h.
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It should check if strict_strtoul() succeeds.
[[email protected]: don't override strict_strtoul() return value]
Signed-off-by: Liu Yuan <[email protected]>
Acked-by: Michael Hennerich <[email protected]>
Cc: Richard Purdie <[email protected]>
Cc: Paul Mundt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It has been reported on some laptops that kswapd is consuming large
amounts of CPU and not being scheduled when SLUB is enabled during large
amounts of file copying. It is expected that this is due to kswapd
missing every cond_resched() point because;
shrink_page_list() calls cond_resched() if inactive pages were isolated
which in turn may not happen if all_unreclaimable is set in
shrink_zones(). If for whatver reason, all_unreclaimable is
set on all zones, we can miss calling cond_resched().
balance_pgdat() only calls cond_resched if the zones are not
balanced. For a high-order allocation that is balanced, it
checks order-0 again. During that window, order-0 might have
become unbalanced so it loops again for order-0 and returns
that it was reclaiming for order-0 to kswapd(). It can then
find that a caller has rewoken kswapd for a high-order and
re-enters balance_pgdat() without ever calling cond_resched().
shrink_slab only calls cond_resched() if we are reclaiming slab
pages. If there are a large number of direct reclaimers, the
shrinker_rwsem can be contended and prevent kswapd calling
cond_resched().
This patch modifies the shrink_slab() case. If the semaphore is
contended, the caller will still check cond_resched(). After each
successful call into a shrinker, the check for cond_resched() remains in
case one shrinker is particularly slow.
[[email protected]: preserve call to cond_resched after each call into shrinker]
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Wu Fengguang <[email protected]>
Cc: James Bottomley <[email protected]>
Tested-by: Colin King <[email protected]>
Cc: Raghavendra D Prabhu <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: <[email protected]> [2.6.38+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There are a few reports of people experiencing hangs when copying large
amounts of data with kswapd using a large amount of CPU which appear to be
due to recent reclaim changes. SLUB using high orders is the trigger but
not the root cause as SLUB has been using high orders for a while. The
root cause was bugs introduced into reclaim which are addressed by the
following two patches.
Patch 1 corrects logic introduced by commit 1741c877 ("mm: kswapd:
keep kswapd awake for high-order allocations until a percentage of
the node is balanced") to allow kswapd to go to sleep when
balanced for high orders.
Patch 2 notes that it is possible for kswapd to miss every
cond_resched() and updates shrink_slab() so it'll at least reach
that scheduling point.
Chris Wood reports that these two patches in isolation are sufficient to
prevent the system hanging. AFAIK, they should also resolve similar hangs
experienced by James Bottomley.
This patch:
Johannes Weiner poined out that the logic in commit 1741c877 ("mm: kswapd:
keep kswapd awake for high-order allocations until a percentage of the
node is balanced") is backwards. Instead of allowing kswapd to go to
sleep when balancing for high order allocations, it keeps it kswapd
running uselessly.
Signed-off-by: Mel Gorman <[email protected]>
Reviewed-by: Rik van Riel <[email protected]>
Signed-off-by: Johannes Weiner <[email protected]>
Reviewed-by: Wu Fengguang <[email protected]>
Cc: James Bottomley <[email protected]>
Tested-by: Colin King <[email protected]>
Cc: Raghavendra D Prabhu <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Rik van Riel <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Reviewed-by: Wu Fengguang <[email protected]>
Cc: <[email protected]> [2.6.38+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|