Age | Commit message (Collapse) | Author | Files | Lines |
|
Fix build failures on UML.
Signed-off-by: Richard Weinberger <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Print a short info about fatal segfaults like other archs do.
Signed-off-by: Richard Weinberger <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The ucast transport is similar to the mcast transport (and, in fact,
shares most of its code), only it uses UDP unicast to move packets.
Obviously this is only useful for point-to-point connections between
virtual ethernet devices.
Signed-off-by: Nolan Leake <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
Cc: David Miller <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
User Mode Linux can also benefit from earlyprintk. UML's earlyprintk
writes kernel messages directly to stdout.
Signed-off-by: Richard Weinberger <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The UML kernel ignores SIGHUP anyway. This handler is in vain.
Signed-off-by: Richard Weinberger <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
UML_LIB_PATH is hardcoded to /usr/lib/uml/, on 64bit systems UML_LIB_PATH
needs to be /usr/lib64/uml/.
Signed-off-by: Richard Weinberger <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Adapt to the new API.
[[email protected]: coding-style fixes]
Signed-off-by: KOSAKI Motohiro <[email protected]>
Cc: Mikael Starvik <[email protected]>
Cc: Jesper Nilsson <[email protected]>
Cc: Thiago Farina <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Adapt to the new API.
We plan to remove old cpumask APIs later. Thus this patch converts them
into the new one.
Signed-off-by: KOSAKI Motohiro <[email protected]>
Cc: David Howells <[email protected]>
Cc: Koichi Yasutake <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Chris Metcalf <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Allow people to use gpiolib on Alpha if they want to, mostly for build
coverage. The header is a stright copy of that for Microblaze, which in
turn was taken from PowerPC.
[[email protected]: define GENERIC_GPIO]
Signed-off-by: Mark Brown <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Acked-by: Grant Likely <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We plan to remove cpu_xx() old APIs. Thus convert them. This patch has
no functional change.
Signed-off-by: KOSAKI Motohiro <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently on nommu arch mmap(),mremap() and munmap() doesn't do
page_align() which isn't consist with mmu arch and cause some issues.
First, some drivers' mmap() function depends on vma->vm_end - vma->start
is page aligned which is true on mmu arch but not on nommu. eg: uvc
camera driver.
Second munmap() may return -EINVAL[split file] error in cases when end is
not page aligned(passed into from userspace) but vma->vm_end is aligned
dure to split or driver's mmap() ops.
Add page alignment to fix those issues.
[[email protected]: coding-style fixes]
Signed-off-by: Bob Liu <[email protected]>
Cc: David Howells <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Greg Ungerer <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The zone->lru_lock is heavily contented in workload where activate_page()
is frequently used. We could do batch activate_page() to reduce the lock
contention. The batched pages will be added into zone list when the pool
is full or page reclaim is trying to drain them.
For example, in a 4 socket 64 CPU system, create a sparse file and 64
processes, processes shared map to the file. Each process read access the
whole file and then exit. The process exit will do unmap_vmas() and cause
a lot of activate_page() call. In such workload, we saw about 58% total
time reduction with below patch. Other workloads with a lot of
activate_page also benefits a lot too.
Andrew Morton suggested activate_page() and putback_lru_pages() should
follow the same path to active pages, but this is hard to implement (see
commit 7a608572a282a ("Revert "mm: batch activate_page() to reduce lock
contention")). On the other hand, do we really need putback_lru_pages()
to follow the same path? I tested several FIO/FFSB benchmark (about 20
scripts for each benchmark) in 3 machines here from 2 sockets to 4
sockets. My test doesn't show anything significant with/without below
patch (there is slight difference but mostly some noise which we found
even without below patch before). Below patch basically returns to the
same as my first post.
I tested some microbenchmarks:
case-anon-cow-rand-mt 0.58%
case-anon-cow-rand -3.30%
case-anon-cow-seq-mt -0.51%
case-anon-cow-seq -5.68%
case-anon-r-rand-mt 0.23%
case-anon-r-rand 0.81%
case-anon-r-seq-mt -0.71%
case-anon-r-seq -1.99%
case-anon-rx-rand-mt 2.11%
case-anon-rx-seq-mt 3.46%
case-anon-w-rand-mt -0.03%
case-anon-w-rand -0.50%
case-anon-w-seq-mt -1.08%
case-anon-w-seq -0.12%
case-anon-wx-rand-mt -5.02%
case-anon-wx-seq-mt -1.43%
case-fork 1.65%
case-fork-sleep -0.07%
case-fork-withmem 1.39%
case-hugetlb -0.59%
case-lru-file-mmap-read-mt -0.54%
case-lru-file-mmap-read 0.61%
case-lru-file-mmap-read-rand -2.24%
case-lru-file-readonce -0.64%
case-lru-file-readtwice -11.69%
case-lru-memcg -1.35%
case-mmap-pread-rand-mt 1.88%
case-mmap-pread-rand -15.26%
case-mmap-pread-seq-mt 0.89%
case-mmap-pread-seq -69.72%
case-mmap-xread-rand-mt 0.71%
case-mmap-xread-seq-mt 0.38%
The most significent are:
case-lru-file-readtwice -11.69%
case-mmap-pread-rand -15.26%
case-mmap-pread-seq -69.72%
which use activate_page a lot. others are basically variations because
each run has slightly difference.
In UP case, 'size mm/swap.o'
before the two patches:
text data bss dec hex filename
6466 896 4 7366 1cc6 mm/swap.o
after the two patches:
text data bss dec hex filename
6343 896 4 7243 1c4b mm/swap.o
Signed-off-by: Shaohua Li <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Hiroyuki Kamezawa <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The copy_to_user_page() function is supposed to flush the icache on the
memory that was written, but the current asm-generic version lacks that
logic. While normally it isn't a big deal as the asm-generic version of
icache flushing is a stub, it is a deal for ports that want to use the
asm-generic version as a baseline and then overlay its own specific parts
(like icache flushing).
Signed-off-by: Mike Frysinger <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
I believe I found a problem in __alloc_pages_slowpath, which allows a
process to get stuck endlessly looping, even when lots of memory is
available.
Running an I/O and memory intensive stress-test I see a 0-order page
allocation with __GFP_IO and __GFP_WAIT, running on a system with very
little free memory. Right about the same time that the stress-test gets
killed by the OOM-killer, the utility trying to allocate memory gets stuck
in __alloc_pages_slowpath even though most of the systems memory was freed
by the oom-kill of the stress-test.
The utility ends up looping from the rebalance label down through the
wait_iff_congested continiously. Because order=0,
__alloc_pages_direct_compact skips the call to get_page_from_freelist.
Because all of the reclaimable memory on the system has already been
reclaimed, __alloc_pages_direct_reclaim skips the call to
get_page_from_freelist. Since there is no __GFP_FS flag, the block with
__alloc_pages_may_oom is skipped. The loop hits the wait_iff_congested,
then jumps back to rebalance without ever trying to
get_page_from_freelist. This loop repeats infinitely.
The test case is pretty pathological. Running a mix of I/O stress-tests
that do a lot of fork() and consume all of the system memory, I can pretty
reliably hit this on 600 nodes, in about 12 hours. 32GB/node.
Signed-off-by: Andrew Barry <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
Reviewed-by: Rik van Riel<[email protected]>
Acked-by: Mel Gorman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add SECTION_ALIGN_UP() and SECTION_ALIGN_DOWN() macro which aligns given
pfn to upper section and lower section boundary accordingly.
Required for the latest memory hotplug support for the Xen balloon driver.
Signed-off-by: Daniel Kiper <[email protected]>
Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The noswapaccount parameter has been deprecated since 2.6.38 without any
complaints from users so we can remove it. swapaccount=0|1 can be used
instead.
As we are removing the parameter we can also clean up swapaccount because
it doesn't have to accept an empty string anymore (to match noswapaccount)
and so we can push = into __setup macro rather than checking "=1" resp.
"=0" strings
Signed-off-by: Michal Hocko <[email protected]>
Cc: Hiroyuki Kamezawa <[email protected]>
Cc: Daisuke Nishimura <[email protected]>
Cc: Balbir Singh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In show_numa_map() we collect statistics into a numa_maps structure.
Since the number of NUMA nodes can be very large, this structure is not a
candidate for stack allocation.
Instead of going thru a kmalloc()+kfree() cycle each time show_numa_map()
is invoked, perform the allocation just once when /proc/pid/numa_maps is
opened.
Performing the allocation when numa_maps is opened, and thus before a
reference to the target tasks mm is taken, eliminates a potential
stalemate condition in the oom-killer as originally described by Hugh
Dickins:
... imagine what happens if the system is out of memory, and the mm
we're looking at is selected for killing by the OOM killer: while
we wait in __get_free_page for more memory, no memory is freed
from the selected mm because it cannot reach exit_mmap while we hold
that reference.
Signed-off-by: Stephen Wilson <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Now that mm/mempolicy.c is no longer implementing /proc/pid/numa_maps
there is no need to export struct proc_maps_private to the world. Move it
to fs/proc/internal.h instead.
Signed-off-by: Stephen Wilson <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Moving show_numa_map() from mempolicy.c to task_mmu.c solves several
issues.
- Having the show() operation "miles away" from the corresponding
seq_file iteration operations is a maintenance burden.
- The need to export ad hoc info like struct proc_maps_private is
eliminated.
- The implementation of show_numa_map() can be improved in a simple
manner by cooperating with the other seq_file operations (start,
stop, etc) -- something that would be messy to do without this
change.
Signed-off-by: Stephen Wilson <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When CONFIG_TMPFS=n mpol_to_str() is not declared in mempolicy.h.
However, in the NUMA case, the definition is always compiled.
Since it is not strictly true that tmpfs is the only client, and since the
symbol was always lurking around anyways, export mpol_to_str()
unconditionally. Furthermore, this will allow us to move show_numa_map()
out of mempolicy.c and into the procfs subsystem.
Signed-off-by: Stephen Wilson <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This function has been superseded by gather_hugetbl_stats() and is no
longer needed.
Signed-off-by: Stephen Wilson <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Improve the prototype of gather_stats() to take a struct numa_maps as
argument instead of a generic void *. Update all callers to make the
required type explicit.
Since gather_stats() is not needed before its definition and is scheduled
to be moved out of mempolicy.c the declaration is removed as well.
Signed-off-by: Stephen Wilson <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Mapping statistics in a NUMA environment is now computed using the generic
walk_page_range() logic. Remove the old/equivalent functionality.
Signed-off-by: Stephen Wilson <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Converting show_numa_map() to use the generic routine decouples the
function from mempolicy.c, allowing it to be moved out of the mm subsystem
and into fs/proc.
Also, include KSM pages in /proc/pid/numa_maps statistics. The pagewalk
logic implemented by check_pte_range() failed to account for such pages as
they were not applicable to the page migration case.
Signed-off-by: Stephen Wilson <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In commit 48fce3429d ("mempolicies: unexport get_vma_policy()")
get_vma_policy() was marked static as all clients were local to
mempolicy.c.
However, the decision to generate /proc/pid/numa_maps in the numa memory
policy code and outside the procfs subsystem introduces an artificial
interdependency between the two systems. Exporting get_vma_policy() once
again is the first step to clean up this interdependency.
Signed-off-by: Stephen Wilson <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Alexey Dobriyan <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Remove noMMU declaration of shmem_get_unmapped_area() from mm.h: it fell
out of use in 2.6.21 and ceased to exist in 2.6.29.
Signed-off-by: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Implement generic xattrs for tmpfs filesystems. The Feodra project, while
trying to replace suid apps with file capabilities, realized that tmpfs,
which is used on the build systems, does not support file capabilities and
thus cannot be used to build packages which use file capabilities. Xattrs
are also needed for overlayfs.
The xattr interface is a bit odd. If a filesystem does not implement any
{get,set,list}xattr functions the VFS will call into some random LSM hooks
and the running LSM can then implement some method for handling xattrs.
SELinux for example provides a method to support security.selinux but no
other security.* xattrs.
As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have
xattr handler routines specifically to handle acls. Because of this tmpfs
would loose the VFS/LSM helpers to support the running LSM. To make up
for that tmpfs had stub functions that did nothing but call into the LSM
hooks which implement the helpers.
This new patch does not use the LSM fallback functions and instead just
implements a native get/set/list xattr feature for the full security.* and
trusted.* namespace like a normal filesystem. This means that tmpfs can
now support both security.selinux and security.capability, which was not
previously possible.
The basic implementation is that I attach a:
struct shmem_xattr {
struct list_head list; /* anchored by shmem_inode_info->xattr_list */
char *name;
size_t size;
char value[0];
};
Into the struct shmem_inode_info for each xattr that is set. This
implementation could easily support the user.* namespace as well, except
some care needs to be taken to prevent large amounts of unswappable memory
being allocated for unprivileged users.
[[email protected]: new config option, suport trusted.*, support symlinks]
Signed-off-by: Eric Paris <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Tested-by: Serge Hallyn <[email protected]>
Cc: Kyle McMartin <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Tested-by: Jordi Pujol <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The bootmem wrapper with memblock supports top-down now, so we no longer
need this trick.
Signed-off-by: Yinghai LU <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Olaf Hering <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The bootmem wrapper with memblock supports top-down now, so we do not need
to set the low limit to __pa(MAX_DMA_ADDRESS).
The logic should be: good to allocate above __pa(MAX_DMA_ADDRESS), but it
is ok if we can not find memory above 16M on system that has a small
amount of RAM.
Signed-off-by: Yinghai LU <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Olaf Hering <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Lucas De Marchi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The problem with having two different types of counters is that developers
adding new code need to keep in mind whether it's safe to use both the
atomic and non-atomic implementations. For example, when adding new
callers of the *_mm_counter() functions a developer needs to ensure that
those paths are always executed with page_table_lock held, in case we're
using the non-atomic implementation of mm counters.
Hugh Dickins introduced the atomic mm counters in commit f412ac08c986
("[PATCH] mm: fix rss and mmlist locking"). When asked why he left the
non-atomic counters around he said,
| The only reason was to avoid adding costly atomic operations into a
| configuration that had no need for them there: the page_table_lock
| sufficed.
|
| Certainly it would be simpler just to delete the non-atomic variant.
|
| And I think it's fair to say that any configuration on which we're
| measuring performance to that degree (rather than "does it boot fast?"
| type measurements), would already be going the split ptlocks route.
Removing the non-atomic counters eases the maintenance burden because
developers no longer have to mindful of the two implementations when using
*_mm_counter().
Note that all architectures provide a means of atomically updating
atomic_long_t variables, even if they have to revert to the generic
spinlock implementation because they don't support 64-bit atomic
instructions (see lib/atomic64.c).
Signed-off-by: Matt Fleming <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Christoph Lameter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The page allocator will improperly return a page from ZONE_NORMAL even
when __GFP_DMA is passed if CONFIG_ZONE_DMA is disabled. The caller
expects DMA memory, perhaps for ISA devices with 16-bit address registers,
and may get higher memory resulting in undefined behavior.
This patch causes the page allocator to return NULL in such circumstances
with a warning emitted to the kernel log on the first occurrence.
Signed-off-by: David Rientjes <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Do not define PFN_SECTION_SHIFT if !CONFIG_SPARSEMEM.
Signed-off-by: Daniel Kiper <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
CONFIG_SPARSEMEM context
pfn_to_section_nr()/section_nr_to_pfn() is valid only in CONFIG_SPARSEMEM
context. Move it to proper place.
Signed-off-by: Daniel Kiper <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
!CONFIG_SPARSEMEM_VMEMMAP
set_page_section() is meaningful only in CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP context. Move it to proper place and amend
accordingly functions which are using it.
Signed-off-by: Daniel Kiper <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Acked-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
online_pages() is only compiled for CONFIG_MEMORY_HOTPLUG_SPARSE, so there
is no need to support CONFIG_FLATMEM code within it.
This patch removes code that is never used.
Signed-off-by: Daniel Kiper <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Acked-by: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It's pointless that deactive_page's operates on unevictable pages. This
patch removes unnecessary overhead which might be a bit problem in case
that there are many unevictable page in system(ex, mprotect workload)
[[email protected]: tidy up comment]
Reviewed-by: KOSAKI Motohiro <[email protected]>
Signed-off-by: Minchan Kim <[email protected]>
Reviewed-by: Rik van Riel<[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Previously the mmap sequential readahead is triggered by updating
ra->prev_pos on each page fault and compare it with current page offset.
It costs dirtying the cache line on each _minor_ page fault. So remove
the ra->prev_pos recording, and instead tag PG_readahead to trigger the
possible sequential readahead. It's not only more simple, but also will
work more reliably and reduce cache line bouncing on concurrent page
faults on shared struct file.
In the mosbench exim benchmark which does multi-threaded page faults on
shared struct file, the ra->mmap_miss and ra->prev_pos updates are found
to cause excessive cache line bouncing on tmpfs, which actually disabled
readahead totally (shmem_backing_dev_info.ra_pages == 0).
So remove the ra->prev_pos recording, and instead tag PG_readahead to
trigger the possible sequential readahead. It's not only more simple, but
also will work more reliably on concurrent reads on shared struct file.
Signed-off-by: Wu Fengguang <[email protected]>
Tested-by: Tim Chen <[email protected]>
Reported-by: Andi Kleen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The original INT_MAX is too large, reduce it to
- avoid unnecessarily dirtying/bouncing the cache line
- restore mmap read-around faster on changed access pattern
Background: in the mosbench exim benchmark which does multi-threaded page
faults on shared struct file, the ra->mmap_miss updates are found to cause
excessive cache line bouncing on tmpfs. The ra state updates are needless
for tmpfs because it actually disabled readahead totally
(shmem_backing_dev_info.ra_pages == 0).
Tested-by: Tim Chen <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Reduce readahead overheads by returning early in do_sync_mmap_readahead().
tmpfs has ra_pages=0 and it can page fault really fast (not constraint by
IO if not swapping).
Signed-off-by: Wu Fengguang <[email protected]>
Tested-by: Tim Chen <[email protected]>
Reported-by: Andi Kleen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Change each shrinker's API by consolidating the existing parameters into
shrink_control struct. This will simplify any further features added w/o
touching each file of shrinker.
[[email protected]: fix build]
[[email protected]: fix warning]
[[email protected]: fix up new shrinker API]
[[email protected]: fix xfs warning]
[[email protected]: update gfs2]
Signed-off-by: Ying Han <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Minchan Kim <[email protected]>
Acked-by: Pavel Emelyanov <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Mel Gorman <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Steven Whitehouse <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Consolidate the existing parameters to shrink_slab() into a new
shrink_control struct. This is needed later to pass the same struct to
shrinkers.
Signed-off-by: Ying Han <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Minchan Kim <[email protected]>
Acked-by: Pavel Emelyanov <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: Mel Gorman <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Pass __GFP_NORETRY|__GFP_NOWARN for readahead page allocations.
readahead page allocations are completely optional. They are OK to fail
and in particular shall not trigger OOM on themselves.
Reported-by: Dave Young <[email protected]>
Reviewed-by: KOSAKI Motohiro <[email protected]>
Signed-off-by: Wu Fengguang <[email protected]>
Reviewed-by: Minchan Kim <[email protected]>
Reviewed-by: Pekka Enberg <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
MIGRATE_RESERVE
This fixes a problem where the first pageblock got marked MIGRATE_RESERVE
even though it only had a few free pages. eg, On current ARM port, The
kernel starts at offset 0x8000 to leave room for boot parameters, and the
memory is freed later.
This in turn caused no contiguous memory to be reserved and frequent
kswapd wakeups that emptied the caches to get more contiguous memory.
Unfortunatelly, ARM needs order-2 allocation for pgd (see
arm/mm/pgd.c#pgd_alloc()). Therefore the issue is not minor nor easy
avoidable.
[[email protected]: added some explanation]
[[email protected]: add !pfn_valid_within() to check]
[[email protected]: check end_pfn in pageblock_is_reserved]
Signed-off-by: John Stultz <[email protected]>
Signed-off-by: Arve Hjønnevåg <[email protected]>
Signed-off-by: KOSAKI Motohiro <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
For m32r, N_NORMAL_MEMORY represents all nodes that have present memory
since it does not support HIGHMEM. This patch sets the bit at the time
the node is initialized.
If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it
uses this nodemask to setup per-cache kmem_cache_node data structures.
Signed-off-by: David Rientjes <[email protected]>
Cc: Hirokazu Takata <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
For alpha, N_NORMAL_MEMORY represents all nodes that have present memory
since it does not support HIGHMEM. This patch sets the bit at the time
the node is initialized.
If N_NORMAL_MEMORY is not accurate, slub may encounter errors since it
uses this nodemask to setup per-cache kmem_cache_node data structures.
Signed-off-by: David Rientjes <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
isolate_lru_page() must be called only with stable reference to the page,
this is what is written in the comment above it, this is reasonable.
current isolate_lru_page() users and its page extra reference sources:
mm/huge_memory.c:
__collapse_huge_page_isolate() - reference from pte
mm/memcontrol.c:
mem_cgroup_move_parent() - get_page_unless_zero()
mem_cgroup_move_charge_pte_range() - reference from pte
mm/memory-failure.c:
soft_offline_page() - fixed, reference from get_any_page()
delete_from_lru_cache() - reference from caller or get_page_unless_zero()
[ seems like there bug, because __memory_failure() can call
page_action() for hpages tail, but it is ok for
isolate_lru_page(), tail getted and not in lru]
mm/memory_hotplug.c:
do_migrate_range() - fixed, get_page_unless_zero()
mm/mempolicy.c:
migrate_page_add() - reference from pte
mm/migrate.c:
do_move_page_to_node_array() - reference from follow_page()
mlock.c: - various external references
mm/vmscan.c:
putback_lru_page() - reference from isolate_lru_page()
It seems that all isolate_lru_page() users are ready now for this
restriction. So, let's replace redundant get_page_unless_zero() with
get_page() and add page initial reference count check with VM_BUG_ON()
Signed-off-by: Konstantin Khlebnikov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Drop first page reference only after calling isolate_lru_page() to keep
page stable reference while isolating.
Signed-off-by: Konstantin Khlebnikov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
isolate_lru_page() must be called only with stable reference to page. So,
let's grab normal page reference.
Signed-off-by: Konstantin Khlebnikov <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Lee Schermerhorn <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
I was tracking down a page allocation failure that ended up in vmalloc().
Since vmalloc() uses 0-order pages, if somebody asks for an insane amount
of memory, we'll still get a warning with "order:0" in it. That's not
very useful.
During recovery, vmalloc() also nicely frees all of the memory that it got
up to the point of the failure. That is wonderful, but it also quickly
hides any issues. We have a much different sitation if vmalloc()
repeatedly fails 10GB in to:
vmalloc(100 * 1<<30);
versus repeatedly failing 4096 bytes in to a:
vmalloc(8192);
This patch will print out messages that look like this:
[ 68.123503] vmalloc: allocation failure, allocated 6680576 of 13426688 bytes
[ 68.124218] bash: page allocation failure: order:0, mode:0xd2
[ 68.124811] Pid: 3770, comm: bash Not tainted 2.6.39-rc3-00082-g85f2e68-dirty #333
[ 68.125579] Call Trace:
[ 68.125853] [<ffffffff810f6da6>] warn_alloc_failed+0x146/0x170
[ 68.126464] [<ffffffff8107e05c>] ? printk+0x6c/0x70
[ 68.126791] [<ffffffff8112b5d4>] ? alloc_pages_current+0x94/0xe0
[ 68.127661] [<ffffffff8111ed37>] __vmalloc_node_range+0x237/0x290
...
The 'order' variable is added for clarity when calling warn_alloc_failed()
to avoid having an unexplained '0' as an argument.
The 'tmp_mask' is because adding an open-coded '| __GFP_NOWARN' would take
us over 80 columns for the alloc_pages_node() call. If we are going to
add a line, it might as well be one that makes the sucker easier to read.
As a side issue, I also noticed that ctl_ioctl() does vmalloc() based
solely on an unverified value passed in from userspace. Granted, it's
under CAP_SYS_ADMIN, but it still frightens me a bit.
Signed-off-by: Dave Hansen <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This originally started as a simple patch to give vmalloc() some more
verbose output on failure on top of the plain page allocator messages.
Johannes suggested that it might be nicer to lead with the vmalloc() info
_before_ the page allocator messages.
But, I do think there's a lot of value in what __alloc_pages_slowpath()
does with its filtering and so forth.
This patch creates a new function which other allocators can call instead
of relying on the internal page allocator warnings. It also gives this
function private rate-limiting which separates it from other
printk_ratelimit() users.
Signed-off-by: Dave Hansen <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|