aboutsummaryrefslogtreecommitdiff
path: root/include/linux/mman.h
AgeCommit message (Collapse)AuthorFilesLines
2017-06-20percpu_counter: Rename __percpu_counter_add to percpu_counter_add_batchNikolay Borisov1-1/+1
Currently, percpu_counter_add is a wrapper around __percpu_counter_add which is preempt safe due to explicit calls to preempt_disable. Given how __ prefix is used in percpu related interfaces, the naming unfortunately creates the false sense that __percpu_counter_add is less safe than percpu_counter_add. In terms of context-safety, they're equivalent. The only difference is that the __ version takes a batch parameter. Make this a bit more explicit by just renaming __percpu_counter_add to percpu_counter_add_batch. This patch doesn't cause any functional changes. tj: Minor updates to patch description for clarity. Cosmetic indentation updates. Signed-off-by: Nikolay Borisov <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Cc: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Cc: David Sterba <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jens Axboe <[email protected]> Cc: [email protected] Cc: "David S. Miller" <[email protected]>
2016-08-02include: mman: use bool instead of int for the return value of ↵Chen Gang1-1/+1
arch_validate_prot For pure bool function's return value, bool is a little better more or less than int. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Chen Gang <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-02-18mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()Dave Hansen1-3/+3
This plumbs a protection key through calc_vm_flag_bits(). We could have done this in calc_vm_prot_bits(), but I did not feel super strongly which way to go. It was pretty arbitrary which one to use. Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arve Hjønnevåg <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Chen Gang <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Airlie <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Geliang Tang <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Maxime Coquelin <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Riley Andrews <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2014-01-21mm: add overcommit_kbytes sysctl variableJerome Marchand1-0/+1
Some applications that run on HPC clusters are designed around the availability of RAM and the overcommit ratio is fine tuned to get the maximum usage of memory without swapping. With growing memory, the 1%-of-all-RAM grain provided by overcommit_ratio has become too coarse for these workload (on a 2TB machine it represents no less than 20GB). This patch adds the new overcommit_kbytes sysctl variable that allow a much finer grain. [[email protected]: coding-style fixes] [[email protected]: fix nommu build] Signed-off-by: Jerome Marchand <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Alan Cox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-11-13mm: factor commit limit calculationJerome Marchand1-0/+2
The same calculation is currently done in three differents places. Factor that code so future changes has to be made at only one place. [[email protected]: uninline vm_commit_limit()] Signed-off-by: Jerome Marchand <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-07-03mm: tune vm_committed_as percpu_counter batching sizeTim Chen1-1/+7
Currently the per cpu counter's batch size for memory accounting is configured as twice the number of cpus in the system. However, for system with very large memory, it is more appropriate to make it proportional to the memory size per cpu in the system. For example, for a x86_64 system with 64 cpus and 128 GB of memory, the batch size is only 2*64 pages (0.5 MB). So any memory accounting changes of more than 0.5MB will overflow the per cpu counter into the global counter. Instead, for the new scheme, the batch size is configured to be 0.4% of the memory/cpu = 8MB (128 GB/64 /256), which is more inline with the memory size. I've done a repeated brk test of 800KB (from will-it-scale test suite) with 80 concurrent processes on a 4 socket Westmere machine with a total of 40 cores. Without the patch, about 80% of cpu is spent on spin-lock contention within the vm_committed_as counter. With the patch, there's a 73x speedup on the benchmark and the lock contention drops off almost entirely. [[email protected]: fix section mismatch] Signed-off-by: Tim Chen <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Wu Fengguang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-03-28Revert "mm: introduce VM_POPULATE flag to better deal with racy userspace ↵Michel Lespinasse1-3/+1
programs" This reverts commit 186930500985 ("mm: introduce VM_POPULATE flag to better deal with racy userspace programs"). VM_POPULATE only has any effect when userspace plays racy games with vmas by trying to unmap and remap memory regions that mmap or mlock are operating on. Also, the only effect of VM_POPULATE when userspace plays such games is that it avoids populating new memory regions that get remapped into the address range that was being operated on by the original mmap or mlock calls. Let's remove VM_POPULATE as there isn't any strong argument to mandate a new vm_flag. Signed-off-by: Michel Lespinasse <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-02-23mm: introduce VM_POPULATE flag to better deal with racy userspace programsMichel Lespinasse1-1/+3
The vm_populate() code populates user mappings without constantly holding the mmap_sem. This makes it susceptible to racy userspace programs: the user mappings may change while vm_populate() is running, and in this case vm_populate() may end up populating the new mapping instead of the old one. In order to reduce the possibility of userspace getting surprised by this behavior, this change introduces the VM_POPULATE vma flag which gets set on vmas we want vm_populate() to work on. This way vm_populate() may still end up populating the new mapping after such a race, but only if the new mapping is also one that the user has requested (using MAP_SHARED, MAP_LOCKED or mlock) to be populated. Signed-off-by: Michel Lespinasse <[email protected]> Acked-by: Rik van Riel <[email protected]> Tested-by: Andy Lutomirski <[email protected]> Cc: Greg Ungerer <[email protected]> Cc: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-11-15mm: export a function to get vm committed memoryK. Y. Srinivasan1-0/+2
It will be useful to be able to access global memory commitment from device drivers. On the Hyper-V platform, the host has a policy engine to balance the available physical memory amongst all competing virtual machines hosted on a given node. This policy engine is driven by a number of metrics including the memory commitment reported by the guests. The balloon driver for Linux on Hyper-V will use this function to retrieve guest memory commitment. This function is also used in Xen self ballooning code. [[email protected]: coding-style tweak] Signed-off-by: K. Y. Srinivasan <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Dan Magenheimer <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2012-10-13UAPI: (Scripted) Disintegrate include/linuxDavid Howells1-11/+1
Signed-off-by: David Howells <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Acked-by: Michael Kerrisk <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Acked-by: Dave Jones <[email protected]>
2012-10-09mm: kill vma flag VM_EXECUTABLE and mm->num_exe_file_vmasKonstantin Khlebnikov1-1/+0
Currently the kernel sets mm->exe_file during sys_execve() and then tracks number of vmas with VM_EXECUTABLE flag in mm->num_exe_file_vmas, as soon as this counter drops to zero kernel resets mm->exe_file to NULL. Plus it resets mm->exe_file at last mmput() when mm->mm_users drops to zero. VMA with VM_EXECUTABLE flag appears after mapping file with flag MAP_EXECUTABLE, such vmas can appears only at sys_execve() or after vma splitting, because sys_mmap ignores this flag. Usually binfmt module sets mm->exe_file and mmaps executable vmas with this file, they hold mm->exe_file while task is running. comment from v2.6.25-6245-g925d1c4 ("procfs task exe symlink"), where all this stuff was introduced: > The kernel implements readlink of /proc/pid/exe by getting the file from > the first executable VMA. Then the path to the file is reconstructed and > reported as the result. > > Because of the VMA walk the code is slightly different on nommu systems. > This patch avoids separate /proc/pid/exe code on nommu systems. Instead of > walking the VMAs to find the first executable file-backed VMA we store a > reference to the exec'd file in the mm_struct. > > That reference would prevent the filesystem holding the executable file > from being unmounted even after unmapping the VMAs. So we track the number > of VM_EXECUTABLE VMAs and drop the new reference when the last one is > unmapped. This avoids pinning the mounted filesystem. exe_file's vma accounting is hooked into every file mmap/unmmap and vma split/merge just to fix some hypothetical pinning fs from umounting by mm, which already unmapped all its executable files, but still alive. Seems like currently nobody depends on this behaviour. We can try to remove this logic and keep mm->exe_file until final mmput(). mm->exe_file is still protected with mm->mmap_sem, because we want to change it via new sys_prctl(PR_SET_MM_EXE_FILE). Also via this syscall task can change its mm->exe_file and unpin mountpoint explicitly. Signed-off-by: Konstantin Khlebnikov <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Carsten Otte <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Eric Paris <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: James Morris <[email protected]> Cc: Jason Baron <[email protected]> Cc: Kentaro Takeda <[email protected]> Cc: Matt Helsley <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Robert Richter <[email protected]> Cc: Suresh Siddha <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Venkatesh Pallipadi <[email protected]> Acked-by: Linus Torvalds <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-07-26atomic: use <linux/atomic.h>Arun Sharma1-1/+1
This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: David Miller <[email protected]> Cc: Eric Dumazet <[email protected]> Acked-by: Mike Frysinger <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-05-02mm: fix Committed_AS underflow on large NR_CPUS environmentKOSAKI Motohiro1-6/+3
The Committed_AS field can underflow in certain situations: > # while true; do cat /proc/meminfo | grep _AS; sleep 1; done | uniq -c > 1 Committed_AS: 18446744073709323392 kB > 11 Committed_AS: 18446744073709455488 kB > 6 Committed_AS: 35136 kB > 5 Committed_AS: 18446744073709454400 kB > 7 Committed_AS: 35904 kB > 3 Committed_AS: 18446744073709453248 kB > 2 Committed_AS: 34752 kB > 9 Committed_AS: 18446744073709453248 kB > 8 Committed_AS: 34752 kB > 3 Committed_AS: 18446744073709320960 kB > 7 Committed_AS: 18446744073709454080 kB > 3 Committed_AS: 18446744073709320960 kB > 5 Committed_AS: 18446744073709454080 kB > 6 Committed_AS: 18446744073709320960 kB Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does not check for underflow. But NR_CPUS proportional isn't good calculation. In general, possibility of lock contention is proportional to the number of online cpus, not theorical maximum cpus (NR_CPUS). The current kernel has generic percpu-counter stuff. using it is right way. it makes code simplify and percpu_counter_read_positive() don't make underflow issue. Reported-by: Dave Hansen <[email protected]> Signed-off-by: KOSAKI Motohiro <[email protected]> Cc: Eric B Munson <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: <[email protected]> [All kernel versions] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-07-09mm: Allow architectures to define additional protection bitsDave Kleikamp1-1/+28
This patch allows architectures to define functions to deal with additional protections bits for mmap() and mprotect(). arch_calc_vm_prot_bits() maps additonal protection bits to vm_flags arch_vm_get_page_prot() maps additional vm_flags to the vma's vm_page_prot arch_validate_prot() checks for valid values of the protection bits Note: vm_get_page_prot() is now pretty ugly, but the generated code should be identical for architectures that don't define additional protection bits. Signed-off-by: Dave Kleikamp <[email protected]> Acked-by: Andrew Morton <[email protected]> Acked-by: Hugh Dickins <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2008-05-24mm: fix atomic_t overflow in vmAlan Cox1-2/+2
The atomic_t type is 32bit but a 64bit system can have more than 2^32 pages of virtual address space available. Without this we overflow on ludicrously large mappings Signed-off-by: Alan Cox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-04-26Don't include linux/config.h from anywhere else in include/David Woodhouse1-1/+0
Signed-off-by: David Woodhouse <[email protected]>
2006-04-25Sanitise linux/mman.h for userspace consumptionDavid Woodhouse1-5/+8
It only really needs to define a few constants and include <asm/mman.h> when it's used by userspace. Move the rest within #ifdef __KERNEL__ Signed-off-by: David Woodhouse <[email protected]>
2005-04-16Linux-2.6.12-rc2Linus Torvalds1-0/+67
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!