aboutsummaryrefslogtreecommitdiff
path: root/mm/slob.c
AgeCommit message (Collapse)AuthorFilesLines
2020-12-15mm: extract might_alloc() debug checkDaniel Vetter1-4/+2
Extracted from slab.h, which seems to have the most complete version including the correct might_sleep() check. Roll it out to slob.c. Motivated by a discussion with Paul about possibly changing call_rcu behaviour to allocate memory, but only roughly every 500th call. There are a lot fewer places in the kernel that care about whether allocating memory is allowed or not (due to deadlocks with reclaim code) than places that care whether sleeping is allowed. But debugging these also tends to be a lot harder, so nice descriptive checks could come in handy. I might have some use eventually for annotations in drivers/gpu. Note that unlike fs_reclaim_acquire/release gfpflags_allow_blocking does not consult the PF_MEMALLOC flags. But there is no flag equivalent for GFP_NOWAIT, hence this check can't go wrong due to memalloc_no*_save/restore contexts. Willy is working on a patch series which might change this: https://lore.kernel.org/linux-mm/[email protected]/ I think best would be if that updates gfpflags_allow_blocking(), since there's a ton of callers all over the place for that already. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Daniel Vetter <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Waiman Long <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Qian Cai <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Cc: Christian König <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Thomas Hellström (Intel) <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm: memcg: convert vmstat slab counters to bytesRoman Gushchin1-6/+6
In order to prepare for per-object slab memory accounting, convert NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE vmstat items to bytes. To make it obvious, rename them to NR_SLAB_RECLAIMABLE_B and NR_SLAB_UNRECLAIMABLE_B (similar to NR_KERNEL_STACK_KB). Internally global and per-node counters are stored in pages, however memcg and lruvec counters are stored in bytes. This scheme may look weird, but only for now. As soon as slab pages will be shared between multiple cgroups, global and node counters will reflect the total number of slab pages. However memcg and lruvec counters will be used for per-memcg slab memory tracking, which will take separate kernel objects in the account. Keeping global and node counters in pages helps to avoid additional overhead. The size of slab memory shouldn't exceed 4Gb on 32-bit machines, so it will fit into atomic_long_t we use for vmstats. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Tejun Heo <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-03-26mm/sl[uo]b: export __kmalloc_track(_node)_callerDaniel Vetter1-0/+2
slab does this already, and I want to use this in a memory allocation tracker in drm for stuff that's tied to the lifetime of a drm_device, not the underlying struct device. Kinda like devres, but for drm. Acked-by: Andrew Morton <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Andrew Morton <[email protected]> Cc: [email protected] Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-10-07mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)Vlastimil Babka1-11/+31
In most configurations, kmalloc() happens to return naturally aligned (i.e. aligned to the block size itself) blocks for power of two sizes. That means some kmalloc() users might unknowingly rely on that alignment, until stuff breaks when the kernel is built with e.g. CONFIG_SLUB_DEBUG or CONFIG_SLOB, and blocks stop being aligned. Then developers have to devise workaround such as own kmem caches with specified alignment [1], which is not always practical, as recently evidenced in [2]. The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()' variant would not help with code unknowingly relying on the implicit alignment. For slab implementations it would either require creating more kmalloc caches, or allocate a larger size and only give back part of it. That would be wasteful, especially with a generic alignment parameter (in contrast with a fixed alignment to size). Ideally we should provide to mm users what they need without difficult workarounds or own reimplementations, so let's make the kmalloc() alignment to size explicitly guaranteed for power-of-two sizes under all configurations. What this means for the three available allocators? * SLAB object layout happens to be mostly unchanged by the patch. The implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due to redzoning, however SLAB disables redzoning for caches with alignment larger than unsigned long long. Practically on at least x86 this includes kmalloc caches as they use cache line alignment, which is larger than that. Still, this patch ensures alignment on all arches and cache sizes. * SLUB layout is also unchanged unless redzoning is enabled through CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With this patch, explicit alignment is guaranteed with redzoning as well. This will result in more memory being wasted, but that should be acceptable in a debugging scenario. * SLOB has no implicit alignment so this patch adds it explicitly for kmalloc(). The potential downside is increased fragmentation. While pathological allocation scenarios are certainly possible, in my testing, after booting a x86_64 kernel+userspace with virtme, around 16MB memory was consumed by slab pages both before and after the patch, with difference in the noise. [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/ [2] https://lore.kernel.org/linux-fsdevel/[email protected]/ [3] https://lwn.net/Articles/787740/ [[email protected]: documentation fixlet, per Matthew] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Christoph Hellwig <[email protected]> Cc: David Sterba <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ming Lei <[email protected]> Cc: Dave Chinner <[email protected]> Cc: "Darrick J . Wong" <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: James Bottomley <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-07mm, sl[ou]b: improve memory accountingVlastimil Babka1-4/+16
Patch series "guarantee natural alignment for kmalloc()", v2. This patch (of 2): SLOB currently doesn't account its pages at all, so in /proc/meminfo the Slab field shows zero. Modifying a counter on page allocation and freeing should be acceptable even for the small system scenarios SLOB is intended for. Since reclaimable caches are not separated in SLOB, account everything as unreclaimable. SLUB currently doesn't account kmalloc() and kmalloc_node() allocations larger than order-1 page, that are passed directly to the page allocator. As they also don't appear in /proc/slabinfo, it might look like a memory leak. For consistency, account them as well. (SLAB doesn't actually use page allocator directly, so no change there). Ideally SLOB and SLUB would be handled in separate patches, but due to the shared kmalloc_order() function and different kfree() implementations, it's easier to patch both at once to prevent inconsistencies. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ming Lei <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: "Darrick J . Wong" <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: James Bottomley <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-24mm: introduce page_size()Matthew Wilcox (Oracle)1-1/+1
Patch series "Make working with compound pages easier", v2. These three patches add three helpers and convert the appropriate places to use them. This patch (of 3): It's unnecessarily hard to find out the size of a potentially huge page. Replace 'PAGE_SIZE << compound_order(page)' with page_size(page). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/slab: refactor common ksize KASAN logic into slab_common.cMarco Elver1-2/+2
This refactors common code of ksize() between the various allocators into slab_common.c: __ksize() is the allocator-specific implementation without instrumentation, whereas ksize() includes the required KASAN logic. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Marco Elver <[email protected]> Acked-by: Christoph Lameter <[email protected]> Reviewed-by: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14slob: use slab_list instead of lruTobin C. Harding1-6/+6
Currently we use the page->lru list for maintaining lists of slabs. We have a list_head in the page structure (slab_list) that can be used for this purpose. Doing so makes the code cleaner since we are not overloading the lru list. The slab_list is part of a union within the page struct (included here stripped down): union { struct { /* Page cache and anonymous pages */ struct list_head lru; ... }; struct { dma_addr_t dma_addr; }; struct { /* slab, slob and slub */ union { struct list_head slab_list; struct { /* Partial pages */ struct page *next; int pages; /* Nr of pages left */ int pobjects; /* Approximate count */ }; }; ... Here we see that slab_list and lru are the same bits. We can verify that this change is safe to do by examining the object file produced from slob.c before and after this patch is applied. Steps taken to verify: 1. checkout current tip of Linus' tree commit a667cb7a94d4 ("Merge branch 'akpm' (patches from Andrew)") 2. configure and build (select SLOB allocator) CONFIG_SLOB=y CONFIG_SLAB_MERGE_DEFAULT=y 3. dissasemble object file `objdump -dr mm/slub.o > before.s 4. apply patch 5. build 6. dissasemble object file `objdump -dr mm/slub.o > after.s 7. diff before.s after.s Use slab_list list_head instead of the lru list_head for maintaining lists of slabs. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tobin C. Harding <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-05-14slob: respect list_head abstraction layerTobin C. Harding1-14/+37
Currently we reach inside the list_head. This is a violation of the layer of abstraction provided by the list_head. It makes the code fragile. More importantly it makes the code wicked hard to understand. The code reaches into the list_head structure to counteract the fact that the list _may_ have been changed during slob_page_alloc(). Instead of this we can add a return parameter to slob_page_alloc() to signal that the list was modified (list_del() called with page->lru to remove page from the freelist). This code is concerned with an optimisation that counters the tendency for first fit allocation algorithm to fragment memory into many small chunks at the front of the memory pool. Since the page is only removed from the list when an allocation uses _all_ the remaining memory in the page then in this special case fragmentation does not occur and we therefore do not need the optimisation. Add a return parameter to slob_page_alloc() to signal that the allocation used up the whole page and that the page was removed from the free list. After calling slob_page_alloc() check the return value just added and only attempt optimisation if the page is still on the list. Use list_head API instead of reaching into the list_head structure to check if sp is at the front of the list. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tobin C. Harding <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-06-07slab: __GFP_ZERO is incompatible with a constructorMatthew Wilcox1-1/+3
__GFP_ZERO requests that the object be initialised to all-zeroes, while the purpose of a constructor is to initialise an object to a particular pattern. We cannot do both. Add a warning to catch any users who mistakenly pass a __GFP_ZERO flag when allocating a slab with a constructor. Link: http://lkml.kernel.org/r/[email protected] Fixes: d07dbea46405 ("Slab allocators: support __GFP_ZERO in all allocators") Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-11-15slab, slub, slob: add slab_flags_tAlexey Dobriyan1-1/+1
Add sparse-checked slab_flags_t for struct kmem_cache::flags (SLAB_POISON, etc). SLAB is bloated temporarily by switching to "unsigned long", but only temporarily. Link: http://lkml.kernel.org/r/20171021100225.GA22428@avx2 Signed-off-by: Alexey Dobriyan <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-11-15mm/slob.c: remove an unnecessary check for __GFP_ZEROMiles Chen1-1/+1
Current flow guarantees a valid pointer when handling the __GFP_ZERO case. So remove the unnecessary NULL pointer check. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Miles Chen <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman1-0/+1
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <[email protected]> Reviewed-by: Philippe Ombredanne <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2017-08-10locking/lockdep: Rework FS_RECLAIM annotationPeter Zijlstra1-2/+4
A while ago someone, and I cannot find the email just now, asked if we could not implement the RECLAIM_FS inversion stuff with a 'fake' lock like we use for other things like workqueues etc. I think this should be possible which allows reducing the 'irq' states and will reduce the amount of __bfs() lookups we do. Removing the 1 IRQ state results in 4 less __bfs() walks per dependency, improving lockdep performance. And by moving this annotation out of the lockdep code it becomes easier for the mm people to extend. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Byungchul Park <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Nikolay Borisov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-04-18mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCUPaul E. McKenney1-3/+3
A group of Linux kernel hackers reported chasing a bug that resulted from their assumption that SLAB_DESTROY_BY_RCU provided an existence guarantee, that is, that no block from such a slab would be reallocated during an RCU read-side critical section. Of course, that is not the case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire slab of blocks. However, there is a phrase for this, namely "type safety". This commit therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order to avoid future instances of this sort of confusion. Signed-off-by: Paul E. McKenney <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Andrew Morton <[email protected]> Cc: <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Vlastimil Babka <[email protected]> [ paulmck: Add comments mentioning the old name, as requested by Eric Dumazet, in order to help people familiar with the old name find the new one. ] Acked-by: David Rientjes <[email protected]>
2016-12-12slub: move synchronize_sched out of slab_mutex on shrinkVladimir Davydov1-1/+1
synchronize_sched() is a heavy operation and calling it per each cache owned by a memory cgroup being destroyed may take quite some time. What is worse, it's currently called under the slab_mutex, stalling all works doing cache creation/destruction. Actually, there isn't much point in calling synchronize_sched() for each cache - it's enough to call it just once - after setting cpu_partial for all caches and before shrinking them. This way, we can also move it out of the slab_mutex, which we have to hold for iterating over the slab cache list. Link: https://bugzilla.kernel.org/show_bug.cgi?id=172991 Link: http://lkml.kernel.org/r/0a10d71ecae3db00fb4421bcd3f82bcc911f4be4.1475329751.git.vdavydov.dev@gmail.com Signed-off-by: Vladimir Davydov <[email protected]> Reported-by: Doug Smythies <[email protected]> Acked-by: Joonsoo Kim <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-02-18mm: slab: free kmem_cache_node after destroy sysfs fileDmitry Safonov1-0/+4
When slub_debug alloc_calls_show is enabled we will try to track location and user of slab object on each online node, kmem_cache_node structure and cpu_cache/cpu_slub shouldn't be freed till there is the last reference to sysfs file. This fixes the following panic: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: list_locations+0x169/0x4e0 PGD 257304067 PUD 438456067 PMD 0 Oops: 0000 [#1] SMP CPU: 3 PID: 973074 Comm: cat ve: 0 Not tainted 3.10.0-229.7.2.ovz.9.30-00007-japdoll-dirty #2 9.30 Hardware name: DEPO Computers To Be Filled By O.E.M./H67DE3, BIOS L1.60c 07/14/2011 task: ffff88042a5dc5b0 ti: ffff88037f8d8000 task.ti: ffff88037f8d8000 RIP: list_locations+0x169/0x4e0 Call Trace: alloc_calls_show+0x1d/0x30 slab_attr_show+0x1b/0x30 sysfs_read_file+0x9a/0x1a0 vfs_read+0x9c/0x170 SyS_read+0x58/0xb0 system_call_fastpath+0x16/0x1b Code: 5e 07 12 00 b9 00 04 00 00 3d 00 04 00 00 0f 4f c1 3d 00 04 00 00 89 45 b0 0f 84 c3 00 00 00 48 63 45 b0 49 8b 9c c4 f8 00 00 00 <48> 8b 43 20 48 85 c0 74 b6 48 89 df e8 46 37 44 00 48 8b 53 10 CR2: 0000000000000020 Separated __kmem_cache_release from __kmem_cache_shutdown which now called on slab_kmem_cache_release (after the last reference to sysfs file object has dropped). Reintroduced locking in free_partial as sysfs file might access cache's partial list after shutdowning - partial revert of the commit 69cb8e6b7c29 ("slub: free slabs without holding locks"). Zap __remove_partial and use remove_partial (w/o underscores) as free_partial now takes list_lock which s partial revert for commit 1e4dd9461fab ("slub: do not assert not having lock in removing freed partial") Signed-off-by: Dmitry Safonov <[email protected]> Suggested-by: Vladimir Davydov <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-11-22slab/slub: adjust kmem_cache_alloc_bulk APIJesper Dangaard Brouer1-1/+1
Adjust kmem_cache_alloc_bulk API before we have any real users. Adjust API to return type 'int' instead of previously type 'bool'. This is done to allow future extension of the bulk alloc API. A future extension could be to allow SLUB to stop at a page boundary, when specified by a flag, and then return the number of objects. The advantage of this approach, would make it easier to make bulk alloc run without local IRQs disabled. With an approach of cmpxchg "stealing" the entire c->freelist or page->freelist. To avoid overshooting we would stop processing at a slab-page boundary. Else we always end up returning some objects at the cost of another cmpxchg. To keep compatible with future users of this API linking against an older kernel when using the new flag, we need to return the number of allocated objects with this API change. Signed-off-by: Jesper Dangaard Brouer <[email protected]> Cc: Vladimir Davydov <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-08mm: rename alloc_pages_exact_node() to __alloc_pages_node()Vlastimil Babka1-2/+2
alloc_pages_exact_node() was introduced in commit 6484eb3e2a81 ("page allocator: do not check NUMA node ID when the caller knows the node is valid") as an optimized variant of alloc_pages_node(), that doesn't fallback to current node for nid == NUMA_NO_NODE. Unfortunately the name of the function can easily suggest that the allocation is restricted to the given node and fails otherwise. In truth, the node is only preferred, unless __GFP_THISNODE is passed among the gfp flags. The misleading name has lead to mistakes in the past, see for example commits 5265047ac301 ("mm, thp: really limit transparent hugepage allocation to local node") and b360edb43f8e ("mm, mempolicy: migrate_to_node should only migrate to node"). Another issue with the name is that there's a family of alloc_pages_exact*() functions where 'exact' means exact size (instead of page order), which leads to more confusion. To prevent further mistakes, this patch effectively renames alloc_pages_exact_node() to __alloc_pages_node() to better convey that it's an optimized variant of alloc_pages_node() not intended for general usage. Both functions get described in comments. It has been also considered to really provide a convenience function for allocations restricted to a node, but the major opinion seems to be that __GFP_THISNODE already provides that functionality and we shouldn't duplicate the API needlessly. The number of users would be small anyway. Existing callers of alloc_pages_exact_node() are simply converted to call __alloc_pages_node(), with the exception of sba_alloc_coherent() which open-codes the check for NUMA_NO_NODE, so it is converted to use alloc_pages_node() instead. This means it no longer performs some VM_BUG_ON checks, and since the current check for nid in alloc_pages_node() uses a 'nid < 0' comparison (which includes NUMA_NO_NODE), it may hide wrong values which would be previously exposed. Both differences will be rectified by the next patch. To sum up, this patch makes no functional changes, except temporarily hiding potentially buggy callers. Restricting the checks in alloc_pages_node() is left for the next patch which can in turn expose more existing buggy callers. Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Robin Holt <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: Michael Ellerman <[email protected]> Cc: Mel Gorman <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Cliff Whickman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-04slab: infrastructure for bulk object allocation and freeingChristoph Lameter1-0/+13
Add the basic infrastructure for alloc/free operations on pointer arrays. It includes a generic function in the common slab code that is used in this infrastructure patch to create the unoptimized functionality for slab bulk operations. Allocators can then provide optimized allocation functions for situations in which large numbers of objects are needed. These optimization may avoid taking locks repeatedly and bypass metadata creation if all objects in slab pages can be used to provide the objects required. Allocators can extend the skeletons provided and add their own code to the bulk alloc and free functions. They can keep the generic allocation and freeing and just fall back to those if optimizations would not work (like for example when debugging is on). Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-04-14slob: make slob_alloc_node() static and remove EXPORT_SYMBOL()Fabian Frederick1-2/+1
slob_alloc_node() is only used in slob.c. Remove the EXPORT_SYMBOL and make slob_alloc_node() static. Signed-off-by: Fabian Frederick <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-12slub: make dead caches discard free slabs immediatelyVladimir Davydov1-1/+1
To speed up further allocations SLUB may store empty slabs in per cpu/node partial lists instead of freeing them immediately. This prevents per memcg caches destruction, because kmem caches created for a memory cgroup are only destroyed after the last page charged to the cgroup is freed. To fix this issue, this patch resurrects approach first proposed in [1]. It forbids SLUB to cache empty slabs after the memory cgroup that the cache belongs to was destroyed. It is achieved by setting kmem_cache's cpu_partial and min_partial constants to 0 and tuning put_cpu_partial() so that it would drop frozen empty slabs immediately if cpu_partial = 0. The runtime overhead is minimal. From all the hot functions, we only touch relatively cold put_cpu_partial(): we make it call unfreeze_partials() after freezing a slab that belongs to an offline memory cgroup. Since slab freezing exists to avoid moving slabs from/to a partial list on free/alloc, and there can't be allocations from dead caches, it shouldn't cause any overhead. We do have to disable preemption for put_cpu_partial() to achieve that though. The original patch was accepted well and even merged to the mm tree. However, I decided to withdraw it due to changes happening to the memcg core at that time. I had an idea of introducing per-memcg shrinkers for kmem caches, but now, as memcg has finally settled down, I do not see it as an option, because SLUB shrinker would be too costly to call since SLUB does not keep free slabs on a separate list. Besides, we currently do not even call per-memcg shrinkers for offline memcgs. Overall, it would introduce much more complexity to both SLUB and memcg than this small patch. Regarding to SLAB, there's no problem with it, because it shrinks per-cpu/node caches periodically. Thanks to list_lru reparenting, we no longer keep entries for offline cgroups in per-memcg arrays (such as memcg_cache_params->memcg_caches), so we do not have to bother if a per-memcg cache will be shrunk a bit later than it could be. [1] http://thread.gmane.org/gmane.linux.kernel.mm/118649/focus=118650 Signed-off-by: Vladimir Davydov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-10-09mm/sl[ao]b: always track caller in kmalloc_(node_)track_caller()Joonsoo Kim1-2/+0
Now, we track caller if tracing or slab debugging is enabled. If they are disabled, we could save one argument passing overhead by calling __kmalloc(_node)(). But, I think that it would be marginal. Furthermore, default slab allocator, SLUB, doesn't use this technique so I think that it's okay to change this situation. After this change, we can turn on/off CONFIG_DEBUG_SLAB without full kernel build and remove some complicated '#if' defintion. It looks more benefitial to me. Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Zhang Yanfei <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-04slab: get_online_mems for kmem_cache_{create,destroy,shrink}Vladimir Davydov1-2/+1
When we create a sl[au]b cache, we allocate kmem_cache_node structures for each online NUMA node. To handle nodes taken online/offline, we register memory hotplug notifier and allocate/free kmem_cache_node corresponding to the node that changes its state for each kmem cache. To synchronize between the two paths we hold the slab_mutex during both the cache creationg/destruction path and while tuning per-node parts of kmem caches in memory hotplug handler, but that's not quite right, because it does not guarantee that a newly created cache will have all kmem_cache_nodes initialized in case it races with memory hotplug. For instance, in case of slub: CPU0 CPU1 ---- ---- kmem_cache_create: online_pages: __kmem_cache_create: slab_memory_callback: slab_mem_going_online_callback: lock slab_mutex for each slab_caches list entry allocate kmem_cache node unlock slab_mutex lock slab_mutex init_kmem_cache_nodes: for_each_node_state(node, N_NORMAL_MEMORY) allocate kmem_cache node add kmem_cache to slab_caches list unlock slab_mutex online_pages (continued): node_states_set_node As a result we'll get a kmem cache with not all kmem_cache_nodes allocated. To avoid issues like that we should hold get/put_online_mems() during the whole kmem cache creation/destruction/shrink paths, just like we deal with cpu hotplug. This patch does the trick. Note, that after it's applied, there is no need in taking the slab_mutex for kmem_cache_shrink any more, so it is removed from there. Signed-off-by: Vladimir Davydov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Tang Chen <[email protected]> Cc: Zhang Yanfei <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Xishi Qiu <[email protected]> Cc: Jiang Liu <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: David Rientjes <[email protected]> Cc: Wen Congyang <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Lai Jiangshan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-04-11mm: slab/slub: use page->list consistently instead of page->lruDave Hansen1-5/+5
'struct page' has two list_head fields: 'lru' and 'list'. Conveniently, they are unioned together. This means that code can use them interchangably, which gets horribly confusing like with this nugget from slab.c: > list_del(&page->lru); > if (page->active == cachep->num) > list_add(&page->list, &n->slabs_full); This patch makes the slab and slub code use page->lru universally instead of mixing ->list and ->lru. So, the new rule is: page->lru is what the you use if you want to keep your page on a list. Don't like the fact that it's not called ->list? Too bad. Signed-off-by: Dave Hansen <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2013-09-04mm/sl[aou]b: Move kmallocXXX functions to common codeChristoph Lameter1-4/+24
The kmalloc* functions of all slab allcoators are similar now so lets move them into slab.h. This requires some function naming changes in slob. As a results of this patch there is a common set of functions for all allocators. Also means that kmalloc_large() is now available in general to perform large order allocations that go directly via the page allocator. kmalloc_large() can be substituted if kmalloc() throws warnings because of too large allocations. kmalloc_large() has exactly the same semantics as kmalloc but can only used for allocations > PAGE_SIZE. Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2013-07-14Merge branch 'slab/for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux Pull slab update from Pekka Enberg: "Highlights: - Fix for boot-time problems on some architectures due to init_lock_keys() not respecting kmalloc_caches boundaries (Christoph Lameter) - CONFIG_SLUB_CPU_PARTIAL requested by RT folks (Joonsoo Kim) - Fix for excessive slab freelist draining (Wanpeng Li) - SLUB and SLOB cleanups and fixes (various people)" I ended up editing the branch, and this avoids two commits at the end that were immediately reverted, and I instead just applied the oneliner fix in between myself. * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux slub: Check for page NULL before doing the node_match check mm/slab: Give s_next and s_stop slab-specific names slob: Check for NULL pointer before calling ctor() slub: Make cpu partial slab support configurable slab: add kmalloc() to kernel API documentation slab: fix init_lock_keys slob: use DIV_ROUND_UP where possible slub: do not put a slab to cpu partial list when cpu_partial is 0 mm/slub: Use node_nr_slabs and node_nr_objs in get_slabinfo mm/slub: Drop unnecessary nr_partials mm/slab: Fix /proc/slabinfo unwriteable for slab mm/slab: Sharing s_next and s_stop between slab and slub mm/slab: Fix drain freelist excessively slob: Rework #ifdeffery in slab.h mm, slab: moved kmem_cache_alloc_node comment to correct place
2013-07-07slob: Check for NULL pointer before calling ctor()Steven Rostedt1-1/+1
While doing some code inspection, I noticed that the slob constructor method can be called with a NULL pointer. If memory is tight and slob fails to allocate with slob_alloc() or slob_new_pages() it still calls the ctor() method with a NULL pointer. Looking at the first ctor() method I found, I noticed that it can not handle a NULL pointer (I'm sure others probably can't either): static void sighand_ctor(void *data) { struct sighand_struct *sighand = data; spin_lock_init(&sighand->siglock); init_waitqueue_head(&sighand->signalfd_wqh); } The solution is to only call the ctor() method if allocation succeeded. Acked-by: Christoph Lameter <[email protected]> Signed-off-by: Steven Rostedt <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2013-07-07slob: use DIV_ROUND_UP where possibleSasha Levin1-1/+1
Acked-by: Christoph Lameter <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2013-02-23mm: rename page struct field helpersMel Gorman1-1/+1
The function names page_xchg_last_nid(), page_last_nid() and reset_page_last_nid() were judged to be inconsistent so rename them to a struct_field_op style pattern. As it looked jarring to have reset_page_mapcount() and page_nid_reset_last() beside each other in memmap_init_zone(), this patch also renames reset_page_mapcount() to page_mapcount_reset(). There are others like init_page_count() but as it is used throughout the arch code a rename would likely cause more conflicts than it is worth. [[email protected]: fix zcache] Signed-off-by: Mel Gorman <[email protected]> Suggested-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-18sl[au]b: always get the cache from its page in kmem_cache_free()Glauber Costa1-1/+1
struct page already has this information. If we start chaining caches, this information will always be more trustworthy than whatever is passed into the function. Signed-off-by: Glauber Costa <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: JoonSoo Kim <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Suleiman Souhlal <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-12-11mm/sl[aou]b: Common alignment codeChristoph Lameter1-10/+0
Extract the code to do object alignment from the allocators. Do the alignment calculations in slab_common so that the __kmem_cache_create functions of the allocators do not have to deal with alignment. Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2012-10-31mm/slob: use min_t() to compare ARCH_SLAB_MINALIGNArnd Bergmann1-3/+3
The definition of ARCH_SLAB_MINALIGN is architecture dependent and can be either of type size_t or int. Comparing that value with ARCH_KMALLOC_MINALIGN can cause harmless warnings on platforms where they are different. Since both are always small positive integer numbers, using the size_t type to compare them is safe and gets rid of the warning. Without this patch, building ARM collie_defconfig results in: mm/slob.c: In function '__kmalloc_node': mm/slob.c:431:152: warning: comparison of distinct pointer types lacks a cast [enabled by default] mm/slob.c: In function 'kfree': mm/slob.c:484:153: warning: comparison of distinct pointer types lacks a cast [enabled by default] mm/slob.c: In function 'ksize': mm/slob.c:503:153: warning: comparison of distinct pointer types lacks a cast [enabled by default] Acked-by: Christoph Lameter <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> [ [email protected]: updates for master ] Signed-off-by: Pekka Enberg <[email protected]>
2012-10-31mm/slob: Use free_page instead of put_page for page-size kmalloc allocationsEzequiel Garcia1-1/+1
When freeing objects, the slob allocator currently free empty pages calling __free_pages(). However, page-size kmallocs are disposed using put_page() instead. It makes no sense to call put_page() for kernel pages that are provided by the object allocator, so we shouldn't be doing this ourselves. This is based on: commit d9b7f22623b5fa9cc189581dcdfb2ac605933bf4 Author: Glauber Costa <[email protected]> slub: use free_page instead of put_page for freeing kmalloc allocation Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Matt Mackall <[email protected]> Acked-by: Glauber Costa <[email protected]> Signed-off-by: Ezequiel Garcia <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2012-10-31mm/sl[aou]b: Move common kmem_cache_size() to slab.hEzequiel Garcia1-6/+0
This function is identically defined in all three allocators and it's trivial to move it to slab.h Since now it's static, inline, header-defined function this patch also drops the EXPORT_SYMBOL tag. Cc: Pekka Enberg <[email protected]> Cc: Matt Mackall <[email protected]> Acked-by: Christoph Lameter <[email protected]> Signed-off-by: Ezequiel Garcia <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2012-10-31mm/slob: Use object_size field in kmem_cache_size()Ezequiel Garcia1-3/+3
Fields object_size and size are not the same: the latter might include slab metadata. Return object_size field in kmem_cache_size(). Also, improve trace accuracy by correctly tracing reported size. Cc: Pekka Enberg <[email protected]> Cc: Matt Mackall <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Ezequiel Garcia <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2012-10-31mm/slob: Drop usage of page->private for storing page-sized allocationsEzequiel Garcia1-14/+10
This field was being used to store size allocation so it could be retrieved by ksize(). However, it is a bad practice to not mark a page as a slab page and then use fields for special purposes. There is no need to store the allocated size and ksize() can simply return PAGE_SIZE << compound_order(page). Cc: Pekka Enberg <[email protected]> Cc: Matt Mackall <[email protected]> Acked-by: Christoph Lameter <[email protected]> Signed-off-by: Ezequiel Garcia <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2012-10-03Merge branch 'slab/tracing' into slab/for-linusPekka Enberg1-1/+1
2012-10-03Merge branch 'slab/common-for-cgroups' into slab/for-linusPekka Enberg1-33/+27
Fix up a trivial conflict with NUMA_NO_NODE cleanups. Conflicts: mm/slob.c Signed-off-by: Pekka Enberg <[email protected]>
2012-09-26mm, slob: fix build breakage in __kmalloc_node_track_callerDavid Rientjes1-1/+1
On Sat, 8 Sep 2012, Ezequiel Garcia wrote: > @@ -454,15 +455,35 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node) > gfp |= __GFP_COMP; > ret = slob_new_pages(gfp, order, node); > > - trace_kmalloc_node(_RET_IP_, ret, > + trace_kmalloc_node(caller, ret, > size, PAGE_SIZE << order, gfp, node); > } > > kmemleak_alloc(ret, size, 1, gfp); > return ret; > } > + > +void *__kmalloc_node(size_t size, gfp_t gfp, int node) > +{ > + return __do_kmalloc_node(size, gfp, node, _RET_IP_); > +} > EXPORT_SYMBOL(__kmalloc_node); > > +#ifdef CONFIG_TRACING > +void *__kmalloc_track_caller(size_t size, gfp_t gfp, unsigned long caller) > +{ > + return __do_kmalloc_node(size, gfp, NUMA_NO_NODE, caller); > +} > + > +#ifdef CONFIG_NUMA > +void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags, > + int node, unsigned long caller) > +{ > + return __do_kmalloc_node(size, gfp, node, caller); > +} > +#endif This breaks Pekka's slab/next tree with this: mm/slob.c: In function '__kmalloc_node_track_caller': mm/slob.c:488: error: 'gfp' undeclared (first use in this function) mm/slob.c:488: error: (Each undeclared identifier is reported only once mm/slob.c:488: error: for each function it appears in.) mm, slob: fix build breakage in __kmalloc_node_track_caller "mm, slob: Add support for kmalloc_track_caller()" breaks the build because gfp is undeclared. Fix it. Acked-by: Ezequiel Garcia <[email protected]> Signed-off-by: David Rientjes <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2012-09-25mm, slob: Add support for kmalloc_track_caller()Ezequiel Garcia1-3/+24
Currently slob falls back to regular kmalloc for this case. With this patch kmalloc_track_caller() is correctly implemented, thus tracing the specified caller. This is important to trace accurately allocations performed by krealloc, kstrdup, kmemdup, etc. Signed-off-by: Ezequiel Garcia <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2012-09-25mm, slob: Use NUMA_NO_NODE instead of -1Ezequiel Garcia1-3/+3
Acked-by: David Rientjes <[email protected]> Signed-off-by: Ezequiel Garcia <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2012-09-05mm/sl[aou]b: Move kmem_cache refcounting to common codeChristoph Lameter1-1/+0
Get rid of the refcount stuff in the allocators and do that part of kmem_cache management in the common code. Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2012-09-05mm/sl[aou]b: Shrink __kmem_cache_create() parameter listsChristoph Lameter1-5/+3
Do the initial settings of the fields in common code. This will allow us to push more processing into common code later and improve readability. Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2012-09-05mm/sl[aou]b: Move kmem_cache allocations into common codeChristoph Lameter1-25/+17
Shift the allocations to common code. That way the allocation and freeing of the kmem_cache structures is handled by common code. Reviewed-by: Glauber Costa <[email protected]> Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2012-09-05mm/sl[aou]b: Get rid of __kmem_cache_destroyChristoph Lameter1-4/+0
What is done there can be done in __kmem_cache_shutdown. This affects RCU handling somewhat. On rcu free all slab allocators do not refer to other management structures than the kmem_cache structure. Therefore these other structures can be freed before the rcu deferred free to the page allocator occurs. Reviewed-by: Joonsoo Kim <[email protected]> Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2012-09-05mm/sl[aou]b: Move freeing of kmem_cache structure to common codeChristoph Lameter1-2/+0
The freeing action is basically the same in all slab allocators. Move to the common kmem_cache_destroy() function. Reviewed-by: Glauber Costa <[email protected]> Reviewed-by: Joonsoo Kim <[email protected]> Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2012-09-05mm/sl[aou]b: Use "kmem_cache" name for slab cache with kmem_cache structChristoph Lameter1-0/+8
Make all allocators use the "kmem_cache" slabname for the "kmem_cache" structure. Reviewed-by: Glauber Costa <[email protected]> Reviewed-by: Joonsoo Kim <[email protected]> Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2012-09-05mm/sl[aou]b: Extract a common function for kmem_cache_destroyChristoph Lameter1-8/+7
kmem_cache_destroy does basically the same in all allocators. Extract common code which is easy since we already have common mutex handling. Reviewed-by: Glauber Costa <[email protected]> Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>
2012-09-05mm/sl[aou]b: Move list_add() to slab_common.cChristoph Lameter1-0/+4
Move the code to append the new kmem_cache to the list of slab caches to the kmem_cache_create code in the shared code. This is possible now since the acquisition of the mutex was moved into kmem_cache_create(). Acked-by: David Rientjes <[email protected]> Reviewed-by: Glauber Costa <[email protected]> Reviewed-by: Joonsoo Kim <[email protected]> Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>