aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-11-05mm: hugetlb: proc: add HugetlbPages field to /proc/PID/statusNaoya Horiguchi6-1/+37
Currently there's no easy way to get per-process usage of hugetlb pages, which is inconvenient because userspace applications which use hugetlb typically want to control their processes on the basis of how much memory (including hugetlb) they use. So this patch simply provides easy access to the info via /proc/PID/status. Signed-off-by: Naoya Horiguchi <[email protected]> Acked-by: Joern Engel <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smapsNaoya Horiguchi2-0/+46
Currently /proc/PID/smaps provides no usage info for vma(VM_HUGETLB), which is inconvenient when we want to know per-task or per-vma base hugetlb usage. To solve this, this patch adds new fields for hugetlb usage like below: Size: 20480 kB Rss: 0 kB Pss: 0 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 0 kB Anonymous: 0 kB AnonHugePages: 0 kB Shared_Hugetlb: 18432 kB Private_Hugetlb: 2048 kB Swap: 0 kB KernelPageSize: 2048 kB MMUPageSize: 2048 kB Locked: 0 kB VmFlags: rd wr mr mw me de ht [[email protected]: fix Private_Hugetlb alignment ] Signed-off-by: Naoya Horiguchi <[email protected]> Acked-by: Joern Engel <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm: use only per-device readahead limitRoman Gushchin3-19/+5
Maximal readahead size is limited now by two values: 1) by global 2Mb constant (MAX_READAHEAD in max_sane_readahead()) 2) by configurable per-device value* (bdi->ra_pages) There are devices, which require custom readahead limit. For instance, for RAIDs it's calculated as number of devices multiplied by chunk size times 2. Readahead size can never be larger than bdi->ra_pages * 2 value (POSIX_FADV_SEQUNTIAL doubles readahead size). If so, why do we need two limits? I suggest to completely remove this max_sane_readahead() stuff and use per-device readahead limit everywhere. Also, using right readahead size for RAID disks can significantly increase i/o performance: before: dd if=/dev/md2 of=/dev/null bs=100M count=100 100+0 records in 100+0 records out 10485760000 bytes (10 GB) copied, 12.9741 s, 808 MB/s after: $ dd if=/dev/md2 of=/dev/null bs=100M count=100 100+0 records in 100+0 records out 10485760000 bytes (10 GB) copied, 8.91317 s, 1.2 GB/s (It's an 8-disks RAID5 storage). This patch doesn't change sys_readahead and madvise(MADV_WILLNEED) behavior introduced by 6d2be915e589b58 ("mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead pages"). Signed-off-by: Roman Gushchin <[email protected]> Cc: Raghavendra K T <[email protected]> Cc: Jan Kara <[email protected]> Cc: Wu Fengguang <[email protected]> Cc: David Rientjes <[email protected]> Cc: onstantin Khlebnikov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/page_alloc: remove unused parameter in init_currently_empty_zone()Yaowei Bai3-8/+5
Commit a2f3aa025766 ("[PATCH] Fix sparsemem on Cell") fixed an oops experienced on the Cell architecture when init-time functions, early_*(), are called at runtime by introducing an 'enum memmap_context' parameter to memmap_init_zone() and init_currently_empty_zone(). This parameter is intended to be used to tell whether the call of these two functions is being made on behalf of a hotplug event, or happening at boot-time. However, init_currently_empty_zone() does not use this parameter at all, so remove it. Signed-off-by: Yaowei Bai <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm, migrate: count pages failing all retries in vmstat and tracepointVlastimil Babka1-1/+2
Migration tries up to 10 times to migrate pages that return -EAGAIN until it gives up. If some pages fail all retries, they are counted towards the number of failed pages that migrate_pages() returns. They should also be counted in the /proc/vmstat pgmigrate_fail and in the mm_migrate_pages tracepoint. Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/memblock: make memblock_remove_range() staticAlexander Kuleshov2-5/+1
memblock_remove_range() is only used in the mm/memblock.c, so we can make it static. Signed-off-by: Alexander Kuleshov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/mremap: use offset_in_page macroAlexander Kuleshov1-6/+6
linux/mm.h provides offset_in_page() macro. Let's use already predefined macro instead of (addr & ~PAGE_MASK). Signed-off-by: Alexander Kuleshov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/mmap: use offset_in_page macroAlexander Kuleshov1-6/+6
linux/mm.h provides offset_in_page() macro. Let's use already predefined macro instead of (addr & ~PAGE_MASK). Signed-off-by: Alexander Kuleshov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/vmalloc: use offset_in_page macroAlexander Kuleshov1-6/+6
linux/mm.h provides offset_in_page() macro. Let's use already predefined macro instead of (addr & ~PAGE_MASK). Signed-off-by: Alexander Kuleshov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/mlock: use offset_in_page macroAlexander Kuleshov1-3/+3
linux/mm.h provides offset_in_page() macro. Let's use already predefined macro instead of (addr & ~PAGE_MASK). Signed-off-by: Alexander Kuleshov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/util: use offset_in_page macroAlexander Kuleshov1-1/+1
linux/mm.h provides offset_in_page() macro. Let's use already predefined macro instead of (addr & ~PAGE_MASK). Signed-off-by: Alexander Kuleshov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/percpu: use offset_in_page macroAlexander Kuleshov1-5/+5
linux/mm.h provides offset_in_page() macro. Let's use already predefined macro instead of (addr & ~PAGE_MASK). Signed-off-by: Alexander Kuleshov <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/early_ioremap: use offset_in_page macroAlexander Kuleshov1-3/+3
linux/mm.h provides offset_in_page() macro. Let's use already predefined macro instead of (addr & ~PAGE_MASK). Signed-off-by: Alexander Kuleshov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/mincore: use offset_in_page macroAlexander Kuleshov1-1/+1
linux/mm.h provides offset_in_page() macro. Let's use already predefined macro instead of (addr & ~PAGE_MASK). Signed-off-by: Alexander Kuleshov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/nommu: use offset_in_page macroAlexander Kuleshov1-4/+4
linux/mm.h provides offset_in_page() macro. Let's use already predefined macro instead of (addr & ~PAGE_MASK). Signed-off-by: Alexander Kuleshov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/msync: use offset_in_page macroAlexander Kuleshov1-1/+1
linux/mm.h provides offset_in_page() macro. Let's use already predefined macro instead of (addr & ~PAGE_MASK). Signed-off-by: Alexander Kuleshov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05arch/powerpc/mm/numa.c: do not allocate bootmem memory for non existing nodesRaghavendra K T1-1/+1
With the setup_nr_nodes(), we have already initialized node_possible_map. So it is safe to use for_each_node here. There are many places in the kernel that use hardcoded 'for' loop with nr_node_ids, because all other architectures have numa nodes populated serially. That should be reason we had maintained the same for powerpc. But, since sparse numa node ids possible on powerpc, we unnecessarily allocate memory for non existent numa nodes. For e.g., on a system with 0,1,16,17 as numa nodes nr_node_ids=18 and we allocate memory for nodes 2-14. This patch we allocate memory for only existing numa nodes. The patch is boot tested on a 4 node tuleta, confirming with printks that it works as expected. Signed-off-by: Raghavendra K T <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: Nishanth Aravamudan <[email protected]> Cc: Greg Kurz <[email protected]> Cc: Grant Likely <[email protected]> Cc: Nikunj A Dadhania <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/list_lru.c: replace nr_node_ids for loop with for_each_node()Raghavendra K T1-11/+23
The functions used in the patch are in slowpath, which gets called whenever alloc_super is called during mounts. Though this should not make difference for the architectures with sequential numa node ids, for the powerpc which can potentially have sparse node ids (for e.g., 4 node system having numa ids, 0,1,16,17 is common), this patch saves some unnecessary allocations for non existing numa nodes. Even without that saving, perhaps patch makes code more readable. [[email protected]: take memcg_aware check outside for_each loop] Signed-off-by: Raghavendra K T <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: Nishanth Aravamudan <[email protected]> Cc: Greg Kurz <[email protected]> Cc: Grant Likely <[email protected]> Cc: Nikunj A Dadhania <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm: fix docbook comment for get_vaddr_frames()Jonathan Corbet1-1/+1
get_vaddr_frames() has a comment that's *almost* a docbook comment; add the missing star so that the tools will find it properly. Signed-off-by: Jonathan Corbet <[email protected]> Cc: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05memcg: drop unnecessary cold-path tests from __memcg_kmem_bypass()Tejun Heo1-14/+0
__memcg_kmem_bypass() decides whether a kmem allocation should be bypassed to the root memcg. Some conditions that it tests are valid criteria regarding who should be held accountable; however, there are a couple unnecessary tests for cold paths - __GFP_FAIL and fatal_signal_pending(). The previous patch updated try_charge() to handle both __GFP_FAIL and dying tasks correctly and the only thing these two tests are doing is making accounting less accurate and sprinkling tests for cold path conditions in the hot paths. There's nothing meaningful gained by these extra tests. This patch removes the two unnecessary tests from __memcg_kmem_bypass(). Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05memcg: ratify and consolidate over-charge handlingTejun Heo1-49/+20
try_charge() is the main charging logic of memcg. When it hits the limit but either can't fail the allocation due to __GFP_NOFAIL or the task is likely to free memory very soon, being OOM killed, has SIGKILL pending or exiting, it "bypasses" the charge to the root memcg and returns -EINTR. While this is one approach which can be taken for these situations, it has several issues. * It unnecessarily lies about the reality. The number itself doesn't go over the limit but the actual usage does. memcg is either forced to or actively chooses to go over the limit because that is the right behavior under the circumstances, which is completely fine, but, if at all avoidable, it shouldn't be misrepresenting what's happening by sneaking the charges into the root memcg. * Despite trying, we already do over-charge. kmemcg can't deal with switching over to the root memcg by the point try_charge() returns -EINTR, so it open-codes over-charing. * It complicates the callers. Each try_charge() user has to handle the weird -EINTR exception. memcg_charge_kmem() does the manual over-charging. mem_cgroup_do_precharge() performs unnecessary uncharging of root memcg, which BTW is inconsistent with what memcg_charge_kmem() does but not broken as [un]charging are noops on root memcg. mem_cgroup_try_charge() needs to switch the returned cgroup to the root one. The reality is that in memcg there are cases where we are forced and/or willing to go over the limit. Each such case needs to be scrutinized and justified but there definitely are situations where that is the right thing to do. We alredy do this but with a superficial and inconsistent disguise which leads to unnecessary complications. This patch updates try_charge() so that it over-charges and returns 0 when deemed necessary. -EINTR return is removed along with all special case handling in the callers. While at it, remove the local variable @ret, which was initialized to zero and never changed, along with done: label which just returned the always zero @ret. Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05memcg: collect kmem bypass conditions into __memcg_kmem_bypass()Tejun Heo1-24/+22
memcg_kmem_newpage_charge() and memcg_kmem_get_cache() are testing the same series of conditions to decide whether to bypass kmem accounting. Collect the tests into __memcg_kmem_bypass(). This is pure refactoring. Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05memcg: punt high overage reclaim to return-to-userland pathTejun Heo4-8/+51
Currently, try_charge() tries to reclaim memory synchronously when the high limit is breached; however, if the allocation doesn't have __GFP_WAIT, synchronous reclaim is skipped. If a process performs only speculative allocations, it can blow way past the high limit. This is actually easily reproducible by simply doing "find /". slab/slub allocator tries speculative allocations first, so as long as there's memory which can be consumed without blocking, it can keep allocating memory regardless of the high limit. This patch makes try_charge() always punt the over-high reclaim to the return-to-userland path. If try_charge() detects that high limit is breached, it adds the overage to current->memcg_nr_pages_over_high and schedules execution of mem_cgroup_handle_over_high() which performs synchronous reclaim from the return-to-userland path. As long as kernel doesn't have a run-away allocation spree, this should provide enough protection while making kmemcg behave more consistently. It also has the following benefits. - All over-high reclaims can use GFP_KERNEL regardless of the specific gfp mask in use, e.g. GFP_NOFS, when the limit was breached. - It copes with prio inversion. Previously, a low-prio task with small memory.high might perform over-high reclaim with a bunch of locks held. If a higher prio task needed any of these locks, it would have to wait until the low prio task finished reclaim and released the locks. By handing over-high reclaim to the task exit path this issue can be avoided. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Acked-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05memcg: flatten task_struct->memcg_oomTejun Heo3-20/+19
task_struct->memcg_oom is a sub-struct containing fields which are used for async memcg oom handling. Most task_struct fields aren't packaged this way and it can lead to unnecessary alignment paddings. This patch flattens it. * task.memcg_oom.memcg -> task.memcg_in_oom * task.memcg_oom.gfp_mask -> task.memcg_oom_gfp_mask * task.memcg_oom.order -> task.memcg_oom_order * task.memcg_oom.may_oom -> task.memcg_may_oom In addition, task.memcg_may_oom is relocated to where other bitfields are which reduces the size of task_struct. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Acked-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/mmap.c: remove useless statement "vma = NULL" in find_vma()Chen Gang1-1/+0
Before the main loop, vma is already is NULL. There is no need to set it to NULL again. Signed-off-by: Chen Gang <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05uaccess: reimplement probe_kernel_address() using probe_kernel_read()Andrew Morton4-32/+17
probe_kernel_address() is basically the same as the (later added) probe_kernel_read(). The return value on EFAULT is a bit different: probe_kernel_address() returns number-of-bytes-not-copied whereas probe_kernel_read() returns -EFAULT. All callers have been checked, none cared. probe_kernel_read() can be overridden by the architecture whereas probe_kernel_address() cannot. parisc, blackfin and um do this, to insert additional checking. Hence this patch possibly fixes obscure bugs, although there are only two probe_kernel_address() callsites outside arch/. My first attempt involved removing probe_kernel_address() entirely and converting all callsites to use probe_kernel_read() directly, but that got tiresome. This patch shrinks mm/slab_common.o by 218 bytes. For a single probe_kernel_address() callsite. Cc: Steven Miao <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/mlock.c: reorganize mlockall() return values and remove goto-out labelAlexey Klimov1-5/+4
In mlockall syscall wrapper after out-label for goto code just doing return. Remove goto out statements and return error values directly. Also instead of rewriting ret variable before every if-check move returns to 'error'-like path under if-check. Objdump asm listing showed me reducing by few asm lines. Object file size descreased from 220592 bytes to 220528 bytes for me (for aarch64). Signed-off-by: Alexey Klimov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/kmemleak.c: remove unneeded initialization of object to NULLAlexey Klimov1-1/+1
Few lines below object is reinitialized by lookup_object() so we don't need to init it by NULL in the beginning of find_and_get_object(). Signed-off-by: Alexey Klimov <[email protected]> Acked-by: Catalin Marinas <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm: slab: only move management objects off-slab for sizes larger than ↵Catalin Marinas1-2/+3
KMALLOC_MIN_SIZE On systems with a KMALLOC_MIN_SIZE of 128 (arm64, some mips and powerpc configurations defining ARCH_DMA_MINALIGN to 128), the first kmalloc_caches[] entry to be initialised after slab_early_init = 0 is "kmalloc-128" with index 7. Depending on the debug kernel configuration, sizeof(struct kmem_cache) can be larger than 128 resulting in an INDEX_NODE of 8. Commit 8fc9cf420b36 ("slab: make more slab management structure off the slab") enables off-slab management objects for sizes starting with PAGE_SIZE >> 5 (128 bytes for a 4KB page configuration) and the creation of the "kmalloc-128" cache would try to place the management objects off-slab. However, since KMALLOC_MIN_SIZE is already 128 and freelist_size == 32 in __kmem_cache_create(), kmalloc_slab(freelist_size) returns NULL (kmalloc_caches[7] not populated yet). This triggers the following bug on arm64: kernel BUG at /work/Linux/linux-2.6-aarch64/mm/slab.c:2283! Internal error: Oops - BUG: 0 [#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.3.0-rc4+ #540 Hardware name: Juno (DT) PC is at __kmem_cache_create+0x21c/0x280 LR is at __kmem_cache_create+0x210/0x280 [...] Call trace: __kmem_cache_create+0x21c/0x280 create_boot_cache+0x48/0x80 create_kmalloc_cache+0x50/0x88 create_kmalloc_caches+0x4c/0xf4 kmem_cache_init+0x100/0x118 start_kernel+0x214/0x33c This patch introduces an OFF_SLAB_MIN_SIZE definition to avoid off-slab management objects for sizes equal to or smaller than KMALLOC_MIN_SIZE. Fixes: 8fc9cf420b36 ("slab: make more slab management structure off the slab") Signed-off-by: Catalin Marinas <[email protected]> Reported-by: Geert Uytterhoeven <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> [3.15+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/slub: calculate start order with reserved in considerationWei Yang1-5/+1
In slub_order(), the order starts from max(min_order, get_order(min_objects * size)). When (min_objects * size) has different order from (min_objects * size + reserved), it will skip this order via a check in the loop. This patch optimizes this a little by calculating the start order with `reserved' in consideration and removing the check in loop. Signed-off-by: Wei Yang <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/slub: use get_order() instead of fls()Wei Yang1-2/+1
get_order() is more easy to understand. This patch just replaces it. Signed-off-by: Wei Yang <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/slub: correct the comment in calculate_order()Wei Yang1-1/+1
In calculate_order(), it tries to calculate the best order by adjusting the fraction and min_objects. On each iteration on min_objects, fraction iterates on 16, 8, 4. Which means the acceptable waste increases with 1/16, 1/8, 1/4. This patch corrects the comment according to the code. Signed-off-by: Wei Yang <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/slab_common.c: initialize kmem_cache pointer to NULLAlexandru Moise1-2/+1
The assignment to NULL within the error condition was written in a 2014 patch to suppress a compiler warning. However it would be cleaner to just initialize the kmem_cache to NULL and just return it in case of an error condition. Signed-off-by: Alexandru Moise <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05Doc/slub: document slabinfo-gnuplot.sh scriptSergey Senozhatsky1-0/+59
Add documentation on how to use slabinfo-gnuplot.sh script. Signed-off-by: Sergey Senozhatsky <[email protected]> Acked-by: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05tools/vm/slabinfo: gnuplot slabifo extended statSergey Senozhatsky1-0/+275
GNUplot `slabinfo -X' stats, collected, for example, using the following command: while [ 1 ]; do slabinfo -X >> stats; sleep 1; done `slabinfo-gnuplot.sh stats' pre-processes collected records and generate graphs (totals, slabs sorted by size, slabs sorted by size). Graphs can be [individually] regenerate with different samples range and graph width-heigh (-r %d,%d and -s %d,%d options). To visually compare N `totals' graphs: slabinfo-gnuplot.sh -t FILE1-totals FILE2-totals ... FILEN-totals Signed-off-by: Sergey Senozhatsky <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05tools/vm/slabinfo: cosmetic globals cleanupSergey Senozhatsky1-27/+27
checkpatch.pl complains about globals being explicitly zeroed out: "ERROR: do not initialise globals to 0 or NULL". New globals, introduced in this patch set, have no explicit 0 initialization; clean up the old ones to make it less hairy. Signed-off-by: Sergey Senozhatsky <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05tools/vm/slabinfo: output sizes in bytesSergey Senozhatsky1-37/+51
Introduce "-B|--Bytes" opt to disable store_size() dynamic size scaling and report size in bytes instead. This `expands' the interface a bit, it's impossible to use printf("%6s") anymore to output sizes. Example: slabinfo -X -N 2 Slabcache Totals ---------------- Slabcaches : 91 Aliases : 119->69 Active: 63 Memory used: 199798784 # Loss : 10689376 MRatio: 5% # Objects : 324301 # PartObj: 18151 ORatio: 5% Per Cache Average Min Max Total ---------------------------------------------------------------------------- #Objects 5147 1 89068 324301 #Slabs 199 1 3886 12537 #PartSlab 12 0 240 778 %PartSlab 32% 0% 100% 6% PartObjs 5 0 4569 18151 % PartObj 26% 0% 100% 5% Memory 3171409 8192 127336448 199798784 Used 3001736 160 121429728 189109408 Loss 169672 0 5906720 10689376 Per Object Average Min Max ----------------------------------------------------------- Memory 585 8 8192 User 583 8 8192 Loss 2 0 64 Slabs sorted by size -------------------- Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg ext4_inode_cache 69948 1736 127336448 3871/0/15 18 3 0 95 a dentry 89068 288 26058752 3164/0/17 28 1 0 98 a Slabs sorted by loss -------------------- Name Objects Objsize Loss Slabs/Part/Cpu O/S O %Fr %Ef Flg ext4_inode_cache 69948 1736 5906720 3871/0/15 18 3 0 95 a inode_cache 11628 864 537472 642/0/4 18 2 0 94 a Besides, store_size() does not use powers of two for G/M/K if (value > 1000000000UL) { divisor = 100000000UL; trailer = 'G'; } else if (value > 1000000UL) { divisor = 100000UL; trailer = 'M'; } else if (value > 1000UL) { divisor = 100; trailer = 'K'; } Signed-off-by: Sergey Senozhatsky <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05tools/vm/slabinfo: introduce extended totals modeSergey Senozhatsky1-10/+44
Add "-X|--Xtotals" opt to output extended totals summary, which includes: -- totals summary -- slabs sorted by size -- slabs sorted by loss (waste) Example: ======= slabinfo --X -N 1 Slabcache Totals ---------------- Slabcaches : 91 Aliases : 120->69 Active: 65 Memory used: 568.3M # Loss : 30.4M MRatio: 5% # Objects : 920.1K # PartObj: 161.2K ORatio: 17% Per Cache Average Min Max Total --------------------------------------------------------- #Objects 14.1K 1 227.8K 920.1K #Slabs 533 1 11.7K 34.7K #PartSlab 86 0 4.3K 5.6K %PartSlab 24% 0% 100% 16% PartObjs 17 0 129.3K 161.2K % PartObj 17% 0% 100% 17% Memory 8.7M 8.1K 384.7M 568.3M Used 8.2M 160 366.5M 537.9M Loss 468.8K 0 18.2M 30.4M Per Object Average Min Max --------------------------------------------- Memory 587 8 8.1K User 584 8 8.1K Loss 2 0 64 Slabs sorted by size ---------------------- Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg ext4_inode_cache 211142 1736 384.7M 11732/40/10 18 3 0 95 a Slabs sorted by loss ---------------------- Name Objects Objsize Loss Slabs/Part/Cpu O/S O %Fr %Ef Flg ext4_inode_cache 211142 1736 18.2M 11732/40/10 18 3 0 95 a Signed-off-by: Sergey Senozhatsky <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05tools/vm/slabinfo: fix alternate opts namesSergey Senozhatsky1-3/+5
Fix mismatches between usage() output and real opts[] options. Add missing alternative opt names, e.g., '-S' had no '--Size' opts[] entry, etc. Signed-off-by: Sergey Senozhatsky <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05tools/vm/slabinfo: sort slabs by lossSergey Senozhatsky1-4/+21
Introduce opt "-L|--sort-loss" to sort and output slabs by loss (waste) in slabcache(). Signed-off-by: Sergey Senozhatsky <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05tools/vm/slabinfo: limit the number of reported slabsSergey Senozhatsky1-3/+15
Introduce opt "-N|--lines=K" to limit the number of slabs being reported in output_slabs(). Signed-off-by: Sergey Senozhatsky <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05tools/vm/slabinfo: use getopt no_argument/optional_argumentSergey Senozhatsky1-17/+17
This patchset adds 'extended' slabinfo mode that provides additional information: -- totals summary -- slabs sorted by size -- slabs sorted by loss (waste) The patches also introduces several new slabinfo options to limit the number of slabs reported, sort slabs by loss (waste); and some fixes. Extended output example (slabinfo -X -N 2): Slabcache Totals ---------------- Slabcaches : 91 Aliases : 119->69 Active: 63 Memory used: 199798784 # Loss : 10689376 MRatio: 5% # Objects : 324301 # PartObj: 18151 ORatio: 5% Per Cache Average Min Max Total ---------------------------------------------------------------------------- #Objects 5147 1 89068 324301 #Slabs 199 1 3886 12537 #PartSlab 12 0 240 778 %PartSlab 32% 0% 100% 6% PartObjs 5 0 4569 18151 % PartObj 26% 0% 100% 5% Memory 3171409 8192 127336448 199798784 Used 3001736 160 121429728 189109408 Loss 169672 0 5906720 10689376 Per Object Average Min Max ----------------------------------------------------------- Memory 585 8 8192 User 583 8 8192 Loss 2 0 64 Slabs sorted by size -------------------- Name Objects Objsize Space Slabs/Part/Cpu O/S O %Fr %Ef Flg ext4_inode_cache 69948 1736 127336448 3871/0/15 18 3 0 95 a dentry 89068 288 26058752 3164/0/17 28 1 0 98 a Slabs sorted by loss -------------------- Name Objects Objsize Loss Slabs/Part/Cpu O/S O %Fr %Ef Flg ext4_inode_cache 69948 1736 5906720 3871/0/15 18 3 0 95 a inode_cache 11628 864 537472 642/0/4 18 2 0 94 a The last patch in the series addresses Linus' comment from http://marc.info/?l=linux-mm&m=144148518703321&w=2 (well, it's been some time. sorry.) gnuplot script takes the slabinfo records file, where every record is a `slabinfo -X' output. So the basic workflow is, for example, as follows: while [ 1 ]; do slabinfo -X -N 2 >> stats; sleep 1; done ^C slabinfo-gnuplot.sh stats The last command will produce 3 png files (and 3 stats files) -- graph of slabinfo totals -- graph of slabs by size -- graph of slabs by loss It's also possible to select a range of records for plotting (a range of collected slabinfo outputs) via `-r 10,100` (for example); and compare totals from several measurements (to visially compare slabs behaviour (10,50 range)) using pre-parsed totals files: slabinfo-gnuplot.sh -r 10,50 -t stats-totals1 .. stats-totals2 This also, technically, supports ktest. Upload new slabinfo to target, collect the stats and give the resulting stats file to slabinfo-gnuplot This patch (of 8): Use getopt constants in `struct option' ->has_arg instead of numerical representations. Signed-off-by: Sergey Senozhatsky <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/slab_common.c: do not warn that cache is busy on destroy more than onceVladimir Davydov1-6/+7
Currently, when kmem_cache_destroy() is called for a global cache, we print a warning for each per memcg cache attached to it that has active objects (see shutdown_cache). This is redundant, because it gives no new information and only clutters the log. If a cache being destroyed has active objects, there must be a memory leak in the module that created the cache, and it does not matter if the cache was used by users in memory cgroups or not. This patch moves the warning from shutdown_cache(), which is called for shutting down both global and per memcg caches, to kmem_cache_destroy(), so that the warning is only printed once if there are objects left in the cache being destroyed. Signed-off-by: Vladimir Davydov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/slab_common.c: clear pointers to per memcg caches on destroyVladimir Davydov2-21/+78
Currently, we do not clear pointers to per memcg caches in the memcg_params.memcg_caches array when a global cache is destroyed with kmem_cache_destroy. This is fine if the global cache does get destroyed. However, a cache can be left on the list if it still has active objects when kmem_cache_destroy is called (due to a memory leak). If this happens, the entries in the array will point to already freed areas, which is likely to result in data corruption when the cache is reused (via slab merging). Signed-off-by: Vladimir Davydov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05mm/slab_common.c: rename cache create/destroy helpersVladimir Davydov1-19/+18
do_kmem_cache_create(), do_kmem_cache_shutdown(), and do_kmem_cache_release() sound awkward for static helper functions that are not supposed to be used outside slab_common.c. Rename them to create_cache(), shutdown_cache(), and release_caches(), respectively. This patch is a pure cleanup and does not introduce any functional changes. Signed-off-by: Vladimir Davydov <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05include/linux/compiler-gcc.h: hide assume_aligned attribute from sparseRasmus Villemoes1-1/+1
The patch "slab.h: sprinkle __assume_aligned attributes" causes *tons* of whinges if you do 'make C=2' with sparse 0.5.0: CHECK drivers/media/usb/pwc/pwc-if.c include/linux/slab.h:307:43: error: attribute '__assume_aligned__': unknown attribute include/linux/slab.h:308:58: error: attribute '__assume_aligned__': unknown attribute include/linux/slab.h:337:73: error: attribute '__assume_aligned__': unknown attribute include/linux/slab.h:375:74: error: attribute '__assume_aligned__': unknown attribute include/linux/slab.h:378:80: error: attribute '__assume_aligned__': unknown attribute sparse apparently pretends to be gcc >= 4.9, yet isn't prepared to handle all the function attributes supported by those gccs and complains loudly. So hide the definition of __assume_aligned from it (so that the generic one in compiler.h gets used). Signed-off-by: Rasmus Villemoes <[email protected]> Reported-by: Valdis Kletnieks <[email protected]> Tested-By: Valdis Kletnieks <[email protected]> Cc: Christopher Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05compiler.h: add support for function attribute assume_alignedRasmus Villemoes2-0/+25
gcc 4.9 added the function attribute assume_aligned, indicating to the caller that the returned pointer may be assumed to have a certain minimal alignment. This is useful if, for example, the return value is passed to memset(). Add a shorthand macro for that. Signed-off-by: Rasmus Villemoes <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05slab: convert slab_is_available() to booleanDenis Kirjanov2-2/+2
A good candidate to return a boolean result. Signed-off-by: Denis Kirjanov <[email protected]> Cc: Christoph Lameter <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05kernel/watchdog.c: fix race between proc_watchdog_thresh() and ↵Ulrich Obergfell1-1/+1
watchdog_timer_fn() Theoretically it is possible that the watchdog timer expires right at the time when a user sets 'watchdog_thresh' to zero (note: this disables the lockup detectors). In this scenario, the is_softlockup() function - which is called by the timer - could produce a false positive. Fix this by checking the current value of 'watchdog_thresh'. Signed-off-by: Ulrich Obergfell <[email protected]> Acked-by: Don Zickus <[email protected]> Reviewed-by: Aaron Tomlin <[email protected]> Cc: Ulrich Obergfell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-05kernel/watchdog.c: remove {get|put}_online_cpus() from ↵Ulrich Obergfell1-4/+6
watchdog_{park|unpark}_threads() watchdog_{park|unpark}_threads() are now called in code paths that protect themselves against CPU hotplug, so {get|put}_online_cpus() calls are redundant and can be removed. Signed-off-by: Ulrich Obergfell <[email protected]> Acked-by: Don Zickus <[email protected]> Reviewed-by: Aaron Tomlin <[email protected]> Cc: Ulrich Obergfell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>