aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-09-30Merge branch 'slab/for-6.1/slub_validation_locking' into slab/for-nextVlastimil Babka1-11/+14
A fix for a regression in slub_debug caches that could cause slab page leaks and subsequent warnings on cache shutdown, by Feng Tang.
2022-09-30mm/slub: fix a slab missed to be freed problemFeng Tang1-11/+14
When enable kasan and kfence's in-kernel kunit test with slub_debug on, it caught a problem (in linux-next tree): ------------[ cut here ]------------ kmem_cache_destroy test: Slab cache still has objects when called from test_exit+0x1a/0x30 WARNING: CPU: 3 PID: 240 at mm/slab_common.c:492 kmem_cache_destroy+0x16c/0x170 Modules linked in: CPU: 3 PID: 240 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc7-next-20220929 #52 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:kmem_cache_destroy+0x16c/0x170 Code: 41 5c 41 5d e9 a5 04 0b 00 c3 cc cc cc cc 48 8b 55 60 48 8b 4c 24 20 48 c7 c6 40 37 d2 82 48 c7 c7 e8 a0 33 83 e8 4e d7 14 01 <0f> 0b eb a7 41 56 41 89 d6 41 55 49 89 f5 41 54 49 89 fc 55 48 89 RSP: 0000:ffff88800775fea0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff83bdec48 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 1ffff11000eebf9e RDI: ffffed1000eebfc6 RBP: ffff88804362fa00 R08: ffffffff81182e58 R09: ffff88800775fbdf R10: ffffed1000eebf7b R11: 0000000000000001 R12: 000000008c800d00 R13: ffff888005e78040 R14: 0000000000000000 R15: ffff888005cdfad0 FS: 0000000000000000(0000) GS:ffff88807ed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000000360e001 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> test_exit+0x1a/0x30 kunit_try_run_case+0xad/0xc0 kunit_generic_run_threadfn_adapter+0x26/0x50 kthread+0x17b/0x1b0 It was biscted to commit c7323a5ad078 ("mm/slub: restrict sysfs validation to debug caches and make it safe") The problem is inside free_debug_processing(), under certain circumstances the slab can be removed from the partial list but not freed by discard_slab() and thus n->nr_slabs is not decreased accordingly. During shutdown, this non-zero n->nr_slabs is detected and reported. Specifically, the problem is that there are two checks for detecting a full partial list by comparing n->nr_partial >= s->min_partial where the latter check is affected by remove_partial() decreasing n->nr_partial between the checks. Reoganize the code so there is a single check upfront. Link: https://lore.kernel.org/all/[email protected]/ Fixes: c7323a5ad078 ("mm/slub: restrict sysfs validation to debug caches and make it safe") Signed-off-by: Feng Tang <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-29Merge branch 'slab/for-6.1/kmalloc_size_roundup' into slab/for-nextVlastimil Babka5-13/+93
The first two patches from a series by Kees Cook [1] that introduce kmalloc_size_roundup(). This will allow merging of per-subsystem patches using the new function and ultimately stop (ab)using ksize() in a way that causes ongoing trouble for debugging functionality and static checkers. [1] https://lore.kernel.org/all/[email protected]/ -- Resolved a conflict of modifying mm/slab.c __ksize() comment with a commit that unifies __ksize() implementation into mm/slab_common.c
2022-09-29Merge branch 'slab/for-6.1/slub_debug_waste' into slab/for-nextVlastimil Babka4-50/+142
A patch from Feng Tang that enhances the existing debugfs alloc_traces file for kmalloc caches with information about how much space is wasted by allocations that needs less space than the particular kmalloc cache provides.
2022-09-29Merge branch 'slab/for-6.1/trivial' into slab/for-nextVlastimil Babka1-3/+6
Additional cleanup by Chao Yu removing a BUG_ON() in create_unique_id().
2022-09-29slab: Introduce kmalloc_size_roundup()Kees Cook4-3/+71
In the effort to help the compiler reason about buffer sizes, the __alloc_size attribute was added to allocators. This improves the scope of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well, as the vast majority of callers are not expecting to use more memory than what they asked for. There is, however, one common exception to this: anticipatory resizing of kmalloc allocations. These cases all use ksize() to determine the actual bucket size of a given allocation (e.g. 128 when 126 was asked for). This comes in two styles in the kernel: 1) An allocation has been determined to be too small, and needs to be resized. Instead of the caller choosing its own next best size, it wants to minimize the number of calls to krealloc(), so it just uses ksize() plus some additional bytes, forcing the realloc into the next bucket size, from which it can learn how large it is now. For example: data = krealloc(data, ksize(data) + 1, gfp); data_len = ksize(data); 2) The minimum size of an allocation is calculated, but since it may grow in the future, just use all the space available in the chosen bucket immediately, to avoid needing to reallocate later. A good example of this is skbuff's allocators: data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); ... /* kmalloc(size) might give us more room than requested. * Put skb_shared_info exactly at the end of allocated zone, * to allow max possible filling before reallocation. */ osize = ksize(data); size = SKB_WITH_OVERHEAD(osize); In both cases, the "how much was actually allocated?" question is answered _after_ the allocation, where the compiler hinting is not in an easy place to make the association any more. This mismatch between the compiler's view of the buffer length and the code's intention about how much it is going to actually use has already caused problems[1]. It is possible to fix this by reordering the use of the "actual size" information. We can serve the needs of users of ksize() and still have accurate buffer length hinting for the compiler by doing the bucket size calculation _before_ the allocation. Code can instead ask "how large an allocation would I get for a given size?". Introduce kmalloc_size_roundup(), to serve this function so we can start replacing the "anticipatory resizing" uses of ksize(). [1] https://github.com/ClangBuiltLinux/linux/issues/1599 https://github.com/KSPP/linux/issues/183 [ [email protected]: add SLOB version ] Cc: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Andrew Morton <[email protected]> Cc: [email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-29slab: Remove __malloc attribute from realloc functionsKees Cook4-12/+15
The __malloc attribute should not be applied to "realloc" functions, as the returned pointer may alias the storage of the prior pointer. Instead of splitting __malloc from __alloc_size, which would be a huge amount of churn, just create __realloc_size for the few cases where it is needed. Thanks to Geert Uytterhoeven <[email protected]> for reporting build failures with gcc-8 in earlier version which tried to remove the #ifdef. While the "alloc_size" attribute is available on all GCC versions, I forgot that it gets disabled explicitly by the kernel in GCC < 9.1 due to misbehaviors. Add a note to the compiler_attributes.h entry for it. Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Marco Elver <[email protected]> Cc: [email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-26mm/slub: clean up create_unique_id()Chao Yu1-3/+6
As Christophe JAILLET suggested [1] In create_unique_id(), "looks that ID_STR_LENGTH could even be reduced to 32 or 16. The 2nd BUG_ON at the end of the function could certainly be just removed as well or remplaced by a: if (p > name + ID_STR_LENGTH - 1) { kfree(name); return -E<something>; } " According to above suggestion, let's do below cleanups: 1. reduce ID_STR_LENGTH to 32, as the buffer size should be enough; 2. use WARN_ON instead of BUG_ON() and return error if check condition is true; 3. use snprintf instead of sprintf to avoid overflow. [1] https://lore.kernel.org/linux-mm/[email protected]/ Suggested-by: Christophe JAILLET <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-23mm/slub: enable debugging memory wasting of kmallocFeng Tang4-50/+142
kmalloc's API family is critical for mm, with one nature that it will round up the request size to a fixed one (mostly power of 2). Say when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes could be allocated, so in worst case, there is around 50% memory space waste. The wastage is not a big issue for requests that get allocated/freed quickly, but may cause problems with objects that have longer life time. We've met a kernel boot OOM panic (v5.10), and from the dumped slab info: [ 26.062145] kmalloc-2k 814056KB 814056KB From debug we found there are huge number of 'struct iova_magazine', whose size is 1032 bytes (1024 + 8), so each allocation will waste 1016 bytes. Though the issue was solved by giving the right (bigger) size of RAM, it is still nice to optimize the size (either use a kmalloc friendly size or create a dedicated slab for it). And from lkml archive, there was another crash kernel OOM case [1] back in 2019, which seems to be related with the similar slab waste situation, as the log is similar: [ 4.332648] iommu: Adding device 0000:20:02.0 to group 16 [ 4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0 ... [ 4.857565] kmalloc-2048 59164KB 59164KB The crash kernel only has 256M memory, and 59M is pretty big here. (Note: the related code has been changed and optimised in recent kernel [2], these logs are just picked to demo the problem, also a patch changing its size to 1024 bytes has been merged) So add an way to track each kmalloc's memory waste info, and leverage the existing SLUB debug framework (specifically SLUB_STORE_USER) to show its call stack of original allocation, so that user can evaluate the waste situation, identify some hot spots and optimize accordingly, for a better utilization of memory. The waste info is integrated into existing interface: '/sys/kernel/debug/slab/kmalloc-xx/alloc_traces', one example of 'kmalloc-4k' after boot is: 126 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] waste=233856/1856 age=280763/281414/282065 pid=1330 cpus=32 nodes=1 __kmem_cache_alloc_node+0x11f/0x4e0 __kmalloc_node+0x4e/0x140 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] ixgbe_init_interrupt_scheme+0x2ae/0xc90 [ixgbe] ixgbe_probe+0x165f/0x1d20 [ixgbe] local_pci_probe+0x78/0xc0 work_for_cpu_fn+0x26/0x40 ... which means in 'kmalloc-4k' slab, there are 126 requests of 2240 bytes which got a 4KB space (wasting 1856 bytes each and 233856 bytes in total), from ixgbe_alloc_q_vector(). And when system starts some real workload like multiple docker instances, there could are more severe waste. [1]. https://lkml.org/lkml/2019/8/12/266 [2]. https://lore.kernel.org/lkml/[email protected]/ [Thanks Hyeonggon for pointing out several bugs about sorting/format] [Thanks Vlastimil for suggesting way to reduce memory usage of orig_size and keep it only for kmalloc objects] Signed-off-by: Feng Tang <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Cc: Robin Murphy <[email protected]> Cc: John Garry <[email protected]> Cc: Kefeng Wang <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-23Merge branch 'slab/for-6.1/slub_validation_locking' into slab/for-nextVlastimil Babka1-198/+276
My series [1] to fix validation races for caches with enabled debugging. By decoupling the debug cache operation more from non-debug fastpaths, additional locking simplifications were possible and done afterwards. Additional cleanup of PREEMPT_RT specific code on top, by Thomas Gleixner. [1] https://lore.kernel.org/all/[email protected]/
2022-09-23Merge branch 'slab/for-6.1/common_kmalloc' into slab/for-nextVlastimil Babka8-655/+341
The "common kmalloc v4" series [1] by Hyeonggon Yoo. - Improves the mm/slab_common.c wrappers to allow deleting duplicated code between SLAB and SLUB. - Large kmalloc() allocations in SLAB are passed to page allocator like in SLUB, reducing number of kmalloc caches. - Removes the {kmem_cache_alloc,kmalloc}_node variants of tracepoints, node id parameter added to non-_node variants. - 8 files changed, 341 insertions(+), 651 deletions(-) [1] https://lore.kernel.org/all/[email protected]/ -- Merge resolves trivial conflict in mm/slub.c with commit 5373b8a09d6e ("kasan: call kasan_malloc() from __kmalloc_*track_caller()")
2022-09-23Merge branch 'slab/for-6.1/trivial' into slab/for-nextVlastimil Babka2-12/+3
Trivial fixes and cleanups: - unneeded variable removals, by ye xingchen
2022-09-22mm: slub: fix flush_cpu_slab()/__free_slab() invocations in task context.Maurizio Lombardi1-1/+8
Commit 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context") moved all flush_cpu_slab() invocations to the global workqueue to avoid a problem related with deactivate_slab()/__free_slab() being called from an IRQ context on PREEMPT_RT kernels. When the flush_all_cpu_locked() function is called from a task context it may happen that a workqueue with WQ_MEM_RECLAIM bit set ends up flushing the global workqueue, this will cause a dependency issue. workqueue: WQ_MEM_RECLAIM nvme-delete-wq:nvme_delete_ctrl_work [nvme_core] is flushing !WQ_MEM_RECLAIM events:flush_cpu_slab WARNING: CPU: 37 PID: 410 at kernel/workqueue.c:2637 check_flush_dependency+0x10a/0x120 Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core] RIP: 0010:check_flush_dependency+0x10a/0x120[ 453.262125] Call Trace: __flush_work.isra.0+0xbf/0x220 ? __queue_work+0x1dc/0x420 flush_all_cpus_locked+0xfb/0x120 __kmem_cache_shutdown+0x2b/0x320 kmem_cache_destroy+0x49/0x100 bioset_exit+0x143/0x190 blk_release_queue+0xb9/0x100 kobject_cleanup+0x37/0x130 nvme_fc_ctrl_free+0xc6/0x150 [nvme_fc] nvme_free_ctrl+0x1ac/0x2b0 [nvme_core] Fix this bug by creating a workqueue for the flush operation with the WQ_MEM_RECLAIM bit set. Fixes: 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context") Cc: <[email protected]> Signed-off-by: Maurizio Lombardi <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-19mm/slab_common: fix possible double free of kmem_cacheFeng Tang1-1/+4
When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu' kunit test case cause a use-after-free error: BUG: KASAN: use-after-free in kobject_del+0x14/0x30 Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261 CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc5-next-20220916 #17 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 print_address_description.constprop.0+0x87/0x2a5 print_report+0x103/0x1ed kasan_report+0xb7/0x140 kobject_del+0x14/0x30 kmem_cache_destroy+0x130/0x170 test_exit+0x1a/0x30 kunit_try_run_case+0xad/0xc0 kunit_generic_run_threadfn_adapter+0x26/0x50 kthread+0x17b/0x1b0 </TASK> The cause is inside kmem_cache_destroy(): kmem_cache_destroy acquire lock/mutex shutdown_cache schedule_work(kmem_cache_release) (if RCU flag set) release lock/mutex kmem_cache_release (if RCU flag not set) In some certain timing, the scheduled work could be run before the next RCU flag checking, which can then get a wrong value and lead to double kmem_cache_release(). Fix it by caching the RCU flag inside protected area, just like 'refcnt' Fixes: 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock") Signed-off-by: Feng Tang <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Waiman Long <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-17slub: Make PREEMPT_RT support less convolutedThomas Gleixner1-32/+24
The slub code already has a few helpers depending on PREEMPT_RT. Add a few more and get rid of the CONFIG_PREEMPT_RT conditionals all over the place. No functional change. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: [email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-17mm/slub: simplify __cmpxchg_double_slab() and slab_[un]lock()Vlastimil Babka1-27/+12
The PREEMPT_RT specific disabling of irqs in __cmpxchg_double_slab() (through slab_[un]lock()) is unnecessary as bit_spin_lock() disables preemption and that's sufficient on PREEMPT_RT where no allocation/free operation is performed in hardirq context and so can't interrupt the current operation. That means we no longer need the slab_[un]lock() wrappers, so delete them and rename the current __slab_[un]lock() to slab_[un]lock(). Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Sebastian Andrzej Siewior <[email protected]>
2022-09-17mm/slub: convert object_map_lock to non-raw spinlockVlastimil Babka1-30/+6
The only remaining user of object_map_lock is list_slab_objects(). Obtaining the lock there used to happen under slab_lock() which implied disabling irqs on PREEMPT_RT, thus it's a raw_spinlock. With the slab_lock() removed, we can convert it to a normal spinlock. Also remove the get_map()/put_map() wrappers as list_slab_objects() became their only remaining user. Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Sebastian Andrzej Siewior <[email protected]>
2022-09-17mm/slub: remove slab_lock() usage for debug operationsVlastimil Babka1-11/+8
All alloc and free operations on debug caches are now serialized by n->list_lock, so we can remove slab_lock() usage in validate_slab() and list_slab_objects() as those also happen under n->list_lock. Note the usage in list_slab_objects() could happen even on non-debug caches, but only during cache shutdown time, so there should not be any parallel freeing activity anymore. Except for buggy slab users, but in that case the slab_lock() would not help against the common cmpxchg based fast paths (in non-debug caches) anyway. Also adjust documentation comments accordingly. Suggested-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Acked-by: David Rientjes <[email protected]>
2022-09-17mm/slub: restrict sysfs validation to debug caches and make it safeVlastimil Babka1-52/+180
Rongwei Wang reports [1] that cache validation triggered by writing to /sys/kernel/slab/<cache>/validate is racy against normal cache operations (e.g. freeing) in a way that can cause false positive inconsistency reports for caches with debugging enabled. The problem is that debugging actions that mark object free or active and actual freelist operations are not atomic, and the validation can see an inconsistent state. For caches that do or don't have debugging enabled, additional races involving n->nr_slabs are possible that result in false reports of wrong slab counts. This patch attempts to solve these issues while not adding overhead to normal (especially fastpath) operations for caches that do not have debugging enabled. Such overhead would not be justified to make possible userspace-triggered validation safe. Instead, disable the validation for caches that don't have debugging enabled and make their sysfs validate handler return -EINVAL. For caches that do have debugging enabled, we can instead extend the existing approach of not using percpu freelists to force all alloc/free operations to the slow paths where debugging flags is checked and acted upon. There can adjust the debug-specific paths to increase n->list_lock coverage against concurrent validation as necessary. The processing on free in free_debug_processing() already happens under n->list_lock so we can extend it to actually do the freeing as well and thus make it atomic against concurrent validation. As observed by Hyeonggon Yoo, we do not really need to take slab_lock() anymore here because all paths we could race with are protected by n->list_lock under the new scheme, so drop its usage here. The processing on alloc in alloc_debug_processing() currently doesn't take any locks, but we have to first allocate the object from a slab on the partial list (as debugging caches have no percpu slabs) and thus take the n->list_lock anyway. Add a function alloc_single_from_partial() that grabs just the allocated object instead of the whole freelist, and does the debug processing. The n->list_lock coverage again makes it atomic against validation and it is also ultimately more efficient than the current grabbing of freelist immediately followed by slab deactivation. To prevent races on n->nr_slabs updates, make sure that for caches with debugging enabled, inc_slabs_node() or dec_slabs_node() is called under n->list_lock. When allocating a new slab for a debug cache, handle the allocation by a new function alloc_single_from_new_slab() instead of the current forced deactivation path. Neither of these changes affect the fast paths at all. The changes in slow paths are negligible for non-debug caches. [1] https://lore.kernel.org/all/[email protected]/ Reported-by: Rongwei Wang <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]>
2022-09-16kasan: call kasan_malloc() from __kmalloc_*track_caller()Peter Collingbourne1-0/+4
We were failing to call kasan_malloc() from __kmalloc_*track_caller() which was causing us to sometimes fail to produce KASAN error reports for allocations made using e.g. devm_kcalloc(), as the KASAN poison was not being initialized. Fix it. Signed-off-by: Peter Collingbourne <[email protected]> Cc: <[email protected]> # 5.15 Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-08mm/slub: fix to return errno if kmalloc() failsChao Yu1-1/+4
In create_unique_id(), kmalloc(, GFP_KERNEL) can fail due to out-of-memory, if it fails, return errno correctly rather than triggering panic via BUG_ON(); kernel BUG at mm/slub.c:5893! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Call trace: sysfs_slab_add+0x258/0x260 mm/slub.c:5973 __kmem_cache_create+0x60/0x118 mm/slub.c:4899 create_cache mm/slab_common.c:229 [inline] kmem_cache_create_usercopy+0x19c/0x31c mm/slab_common.c:335 kmem_cache_create+0x1c/0x28 mm/slab_common.c:390 f2fs_kmem_cache_create fs/f2fs/f2fs.h:2766 [inline] f2fs_init_xattr_caches+0x78/0xb4 fs/f2fs/xattr.c:808 f2fs_fill_super+0x1050/0x1e0c fs/f2fs/super.c:4149 mount_bdev+0x1b8/0x210 fs/super.c:1400 f2fs_mount+0x44/0x58 fs/f2fs/super.c:4512 legacy_get_tree+0x30/0x74 fs/fs_context.c:610 vfs_get_tree+0x40/0x140 fs/super.c:1530 do_new_mount+0x1dc/0x4e4 fs/namespace.c:3040 path_mount+0x358/0x914 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __arm64_sys_mount+0x2f8/0x408 fs/namespace.c:3568 Cc: <[email protected]> Fixes: 81819f0fc8285 ("SLUB core") Reported-by: [email protected] Reviewed-by: Muchun Song <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Chao Yu <[email protected]> Acked-by: David Rientjes <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-01mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding ↵Waiman Long1-16/+29
slab_mutex/cpu_hotplug_lock A circular locking problem is reported by lockdep due to the following circular locking dependency. +--> cpu_hotplug_lock --> slab_mutex --> kn->active --+ | | +-----------------------------------------------------+ The forward cpu_hotplug_lock ==> slab_mutex ==> kn->active dependency happens in kmem_cache_destroy(): cpus_read_lock(); mutex_lock(&slab_mutex); ==> sysfs_slab_unlink() ==> kobject_del() ==> kernfs_remove() ==> __kernfs_remove() ==> kernfs_drain(): rwsem_acquire(&kn->dep_map, ...); The backward kn->active ==> cpu_hotplug_lock dependency happens in kernfs_fop_write_iter(): kernfs_get_active(); ==> slab_attr_store() ==> cpu_partial_store() ==> flush_all(): cpus_read_lock() One way to break this circular locking chain is to avoid holding cpu_hotplug_lock and slab_mutex while deleting the kobject in sysfs_slab_unlink() which should be equivalent to doing a write_lock and write_unlock pair of the kn->active virtual lock. Since the kobject structures are not protected by slab_mutex or the cpu_hotplug_lock, we can certainly release those locks before doing the delete operation. Move sysfs_slab_unlink() and sysfs_slab_release() to the newly created kmem_cache_release() and call it outside the slab_mutex & cpu_hotplug_lock critical sections. There will be a slight delay in the deletion of sysfs files if kmem_cache_release() is called indirectly from a work function. Fixes: 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context") Signed-off-by: Waiman Long <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Acked-by: David Rientjes <[email protected]> Link: https://lore.kernel.org/all/YwOImVd+nRUsSAga@hyeyoo/ Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-01mm/sl[au]b: check if large object is valid in __ksize()Hyeonggon Yoo1-1/+6
If address of large object is not beginning of folio or size of the folio is too small, it must be invalid. WARN() and return 0 in such cases. Cc: Marco Elver <[email protected]> Suggested-by: Vlastimil Babka <[email protected]> Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-01mm/slab_common: move declaration of __ksize() to mm/slab.hHyeonggon Yoo4-12/+3
__ksize() is only called by KASAN. Remove export symbol and move declaration to mm/slab.h as we don't want to grow its callers. Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-01mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not usingHyeonggon Yoo5-51/+64
Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc using TRACE_EVENT() macro. And then this patch does: - Do not pass pointer to struct kmem_cache to trace_kmalloc. gfp flag is enough to know if it's accounted or not. - Avoid dereferencing s->object_size and s->size when not using kmem_cache_alloc event. - Avoid dereferencing s->name in when not using kmem_cache_free event. - Adjust s->size to SLOB_UNITS(s->size) * SLOB_UNIT in SLOB Cc: Vasily Averin <[email protected]> Suggested-by: Vlastimil Babka <[email protected]> Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-01mm/slab_common: unify NUMA and UMA version of tracepointsHyeonggon Yoo5-89/+27
Drop kmem_alloc event class, rename kmem_alloc_node to kmem_alloc, and remove _node postfix for NUMA version of tracepoints. This will break some tools that depend on {kmem_cache_alloc,kmalloc}_node, but at this point maintaining both kmem_alloc and kmem_alloc_node event classes does not makes sense at all. Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-01mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace()Hyeonggon Yoo4-75/+41
Despite its name, kmem_cache_alloc[_node]_trace() is hook for inlined kmalloc. So rename it to kmalloc[_node]_trace(). Move its implementation to slab_common.c by using __kmem_cache_alloc_node(), but keep CONFIG_TRACING=n varients to save a function call when CONFIG_TRACING=n. Use __assume_kmalloc_alignment for kmalloc[_node]_trace instead of __assume_slab_alignement. Generally kmalloc has larger alignment requirements. Suggested-by: Vlastimil Babka <[email protected]> Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-01mm/sl[au]b: generalize kmalloc subsystemHyeonggon Yoo5-200/+107
Now everything in kmalloc subsystem can be generalized. Let's do it! Generalize __do_kmalloc_node(), __kmalloc_node_track_caller(), kfree(), __ksize(), __kmalloc(), __kmalloc_node() and move them to slab_common.c. In the meantime, rename kmalloc_large_node_notrace() to __kmalloc_large_node() and make it static as it's now only called in slab_common.c. [ [email protected]: adjust kfence skip list to include __kmem_cache_free so that kfence kunit tests do not fail ] Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-08-25mm/slub: move free_debug_processing() furtherVlastimil Babka1-57/+57
In the following patch, the function free_debug_processing() will be calling add_partial(), remove_partial() and discard_slab(), se move it below their definitions to avoid forward declarations. To make review easier, separate the move from functional changes. Signed-off-by: Vlastimil Babka <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Acked-by: David Rientjes <[email protected]>
2022-08-24mm/sl[au]b: introduce common alloc/free functions without tracepointHyeonggon Yoo3-7/+47
To unify kmalloc functions in later patch, introduce common alloc/free functions that does not have tracepoint. Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-08-24mm/slab: kmalloc: pass requests larger than order-1 page to page allocatorHyeonggon Yoo5-62/+68
There is not much benefit for serving large objects in kmalloc(). Let's pass large requests to page allocator like SLUB for better maintenance of common code. Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-08-24mm/slab_common: cleanup kmalloc_large()Hyeonggon Yoo1-22/+13
Now that kmalloc_large() and kmalloc_large_node() do mostly same job, make kmalloc_large() wrapper of kmalloc_large_node_notrace(). In the meantime, add missing flag fix code in kmalloc_large_node_notrace(). Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-08-24mm/slab_common: kmalloc_node: pass large requests to page allocatorHyeonggon Yoo4-9/+32
Now that kmalloc_large_node() is in common code, pass large requests to page allocator in kmalloc_node() using kmalloc_large_node(). One problem is that currently there is no tracepoint in kmalloc_large_node(). Instead of simply putting tracepoint in it, use kmalloc_large_node{,_notrace} depending on its caller to show useful address for both inlined kmalloc_node() and __kmalloc_node_track_caller() when large objects are allocated. Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-08-24mm/slub: move kmalloc_large_node() to slab_common.cHyeonggon Yoo3-25/+26
In later patch SLAB will also pass requests larger than order-1 page to page allocator. Move kmalloc_large_node() to slab_common.c. Fold kmalloc_large_node_hook() into kmalloc_large_node() as there is no other caller. Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-08-24mm/slab_common: fold kmalloc_order_trace() into kmalloc_large()Hyeonggon Yoo2-33/+6
There is no caller of kmalloc_order_trace() except kmalloc_large(). Fold it into kmalloc_large() and remove kmalloc_order{,_trace}(). Also add tracepoint in kmalloc_large() that was previously in kmalloc_order_trace(). Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-08-24mm/sl[au]b: factor out __do_kmalloc_node()Hyeonggon Yoo2-81/+20
__kmalloc(), __kmalloc_node(), __kmalloc_node_track_caller() mostly do same job. Factor out common code into __do_kmalloc_node(). Note that this patch also fixes missing kasan_kmalloc() in SLUB's __kmalloc_node_track_caller(). Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-08-24mm/slab_common: cleanup kmalloc_track_caller()Hyeonggon Yoo4-43/+8
Make kmalloc_track_caller() wrapper of kmalloc_node_track_caller(). Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-08-24mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functionsHyeonggon Yoo4-40/+1
Now that slab_alloc_node() is available for SLAB when CONFIG_NUMA=n, remove CONFIG_NUMA ifdefs for common kmalloc functions. Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-08-24mm/slab: cleanup slab_alloc() and slab_alloc_node()Hyeonggon Yoo1-36/+13
Make slab_alloc_node() available even when CONFIG_NUMA=n and make slab_alloc() wrapper of slab_alloc_node(). This is necessary for further cleanup. Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-08-24mm/slab: move NUMA-related code to __do_cache_alloc()Hyeonggon Yoo1-37/+31
To implement slab_alloc_node() independent of NUMA configuration, move NUMA fallback/alternate allocation code into __do_cache_alloc(). One functional change here is not to check availability of node when allocating from local node. Signed-off-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-08-23mm/slub: Remove the unneeded result variableye xingchen1-7/+2
Return the value from attribute->store(s, buf, len) and attribute->show(s, buf) directly instead of storing it in another redundant variable. Reported-by: Zeal Robot <[email protected]> Acked-by: Hyeonggon Yoo <[email protected]> Signed-off-by: ye xingchen <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-08-23mm/slab_common: Remove the unneeded result variableye xingchen1-5/+1
Return the value from __kmem_cache_shrink() directly instead of storing it in another redundant variable. Reported-by: Zeal Robot <[email protected]> Signed-off-by: ye xingchen <[email protected]> Acked-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-08-21Linux 6.0-rc2Linus Torvalds1-1/+1
2022-08-21Merge tag 'irq-urgent-2022-08-21' of ↵Linus Torvalds7-32/+34
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: "Misc irqchip fixes: LoongArch driver fixes and a Hyper-V IOMMU fix" * tag 'irq-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/loongson-liointc: Fix an error handling path in liointc_init() irqchip/loongarch: Fix irq_domain_alloc_fwnode() abuse irqchip/loongson-pch-pic: Move find_pch_pic() into CONFIG_ACPI irqchip/loongson-eiointc: Fix a build warning irqchip/loongson-eiointc: Fix irq affinity setting iommu/hyper-v: Use helper instead of directly accessing affinity
2022-08-21Merge tag 'perf-urgent-2022-08-21' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 kprobes fix from Ingo Molnar: "Fix a kprobes bug in JNG/JNLE emulation when a kprobe is installed at such instructions, possibly resulting in incorrect execution (the wrong branch taken)" * tag 'perf-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kprobes: Fix JNG/JNLE emulation
2022-08-21Merge tag 'trace-v6.0-rc1-2' of ↵Linus Torvalds5-21/+119
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Various fixes for tracing: - Fix a return value of traceprobe_parse_event_name() - Fix NULL pointer dereference from failed ftrace enabling - Fix NULL pointer dereference when asking for registers from eprobes - Make eprobes consistent with kprobes/uprobes, filters and histograms" * tag 'trace-v6.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Have filter accept "common_cpu" to be consistent tracing/probes: Have kprobes and uprobes use $COMM too tracing/eprobes: Have event probes be consistent with kprobes and uprobes tracing/eprobes: Fix reading of string fields tracing/eprobes: Do not hardcode $comm as a string tracing/eprobes: Do not allow eprobes to use $stack, or % for regs ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead tracing/perf: Fix double put of trace event when init fails tracing: React to error return from traceprobe_parse_event_name()
2022-08-21tracing: Have filter accept "common_cpu" to be consistentSteven Rostedt (Google)1-0/+1
Make filtering consistent with histograms. As "cpu" can be a field of an event, allow for "common_cpu" to keep it from being confused with the "cpu" field of the event. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/[email protected]/ Cc: [email protected] Cc: Ingo Molnar <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Tom Zanussi <[email protected]> Fixes: 1e3bac71c5053 ("tracing/histogram: Rename "cpu" to "common_cpu"") Suggested-by: Masami Hiramatsu (Google) <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-08-21tracing/probes: Have kprobes and uprobes use $COMM tooSteven Rostedt (Google)1-2/+3
Both $comm and $COMM can be used to get current->comm in eprobes and the filtering and histogram logic. Make kprobes and uprobes consistent in this regard and allow both $comm and $COMM as well. Currently kprobes and uprobes only handle $comm, which is inconsistent with the other utilities, and can be confusing to users. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/[email protected]/ Cc: [email protected] Cc: Ingo Molnar <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Tom Zanussi <[email protected]> Fixes: 533059281ee5 ("tracing: probeevent: Introduce new argument fetching code") Suggested-by: Masami Hiramatsu (Google) <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-08-21tracing/eprobes: Have event probes be consistent with kprobes and uprobesSteven Rostedt (Google)1-6/+64
Currently, if a symbol "@" is attempted to be used with an event probe (eprobes), it will cause a NULL pointer dereference crash. Both kprobes and uprobes can reference data other than the main registers. Such as immediate address, symbols and the current task name. Have eprobes do the same thing. For "comm", if "comm" is used and the event being attached to does not have the "comm" field, then make it the "$comm" that kprobes has. This is consistent to the way histograms and filters work. Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: Ingo Molnar <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Tom Zanussi <[email protected]> Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-08-21tracing/eprobes: Fix reading of string fieldsSteven Rostedt (Google)1-0/+21
Currently when an event probe (eprobe) hooks to a string field, it does not display it as a string, but instead as a number. This makes the field rather useless. Handle the different kinds of strings, dynamic, static, relational/dynamic etc. Now when a string field is used, the ":string" type can be used to display it: echo "e:sw sched/sched_switch comm=$next_comm:string" > dynamic_events Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: Ingo Molnar <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Tom Zanussi <[email protected]> Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>