aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2023-10-05perf/core: Rename perf_proc_update_handler() -> ↵Xiu Jianfeng2-3/+3
perf_event_max_sample_rate_handler(), for readability Follow the naming pattern of the other sysctl handlers in perf. Signed-off-by: Xiu Jianfeng <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-04eventfs: Remove eventfs_file and just use eventfs_inodeSteven Rostedt (Google)3-103/+221
Instead of having a descriptor for every file represented in the eventfs directory, only have the directory itself represented. Change the API to send in a list of entries that represent all the files in the directory (but not other directories). The entry list contains a name and a callback function that will be used to create the files when they are accessed. struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry *parent, const struct eventfs_entry *entries, int size, void *data); is used for the top level eventfs directory, and returns an eventfs_inode that will be used by: struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode *parent, const struct eventfs_entry *entries, int size, void *data); where both of the above take an array of struct eventfs_entry entries for every file that is in the directory. The entries are defined by: typedef int (*eventfs_callback)(const char *name, umode_t *mode, void **data, const struct file_operations **fops); struct eventfs_entry { const char *name; eventfs_callback callback; }; Where the name is the name of the file and the callback gets called when the file is being created. The callback passes in the name (in case the same callback is used for multiple files), a pointer to the mode, data and fops. The data will be pointing to the data that was passed in eventfs_create_dir() or eventfs_create_events_dir() but may be overridden to point to something else, as it will be used to point to the inode->i_private that is created. The information passed back from the callback is used to create the dentry/inode. If the callback fills the data and the file should be created, it must return a positive number. On zero or negative, the file is ignored. This logic may also be used as a prototype to convert entire pseudo file systems into just-in-time allocation. The "show_events_dentry" file has been updated to show the directories, and any files they have. With just the eventfs_file allocations: Before after deltas for meminfo (in kB): MemFree: -14360 MemAvailable: -14260 Buffers: 40 Cached: 24 Active: 44 Inactive: 48 Inactive(anon): 28 Active(file): 44 Inactive(file): 20 Dirty: -4 AnonPages: 28 Mapped: 4 KReclaimable: 132 Slab: 1604 SReclaimable: 132 SUnreclaim: 1472 Committed_AS: 12 Before after deltas for slabinfo: <slab>: <objects> [ * <size> = <total>] ext4_inode_cache 27 [* 1184 = 31968 ] extent_status 102 [* 40 = 4080 ] tracefs_inode_cache 144 [* 656 = 94464 ] buffer_head 39 [* 104 = 4056 ] shmem_inode_cache 49 [* 800 = 39200 ] filp -53 [* 256 = -13568 ] dentry 251 [* 192 = 48192 ] lsm_file_cache 277 [* 32 = 8864 ] vm_area_struct -14 [* 184 = -2576 ] trace_event_file 1748 [* 88 = 153824 ] kmalloc-1k 35 [* 1024 = 35840 ] kmalloc-256 49 [* 256 = 12544 ] kmalloc-192 -28 [* 192 = -5376 ] kmalloc-128 -30 [* 128 = -3840 ] kmalloc-96 10581 [* 96 = 1015776 ] kmalloc-64 3056 [* 64 = 195584 ] kmalloc-32 1291 [* 32 = 41312 ] kmalloc-16 2310 [* 16 = 36960 ] kmalloc-8 9216 [* 8 = 73728 ] Free memory dropped by 14,360 kB Available memory dropped by 14,260 kB Total slab additions in size: 1,771,032 bytes With this change: Before after deltas for meminfo (in kB): MemFree: -12084 MemAvailable: -11976 Buffers: 32 Cached: 32 Active: 72 Inactive: 168 Inactive(anon): 176 Active(file): 72 Inactive(file): -8 Dirty: 24 AnonPages: 196 Mapped: 8 KReclaimable: 148 Slab: 836 SReclaimable: 148 SUnreclaim: 688 Committed_AS: 324 Before after deltas for slabinfo: <slab>: <objects> [ * <size> = <total>] tracefs_inode_cache 144 [* 656 = 94464 ] shmem_inode_cache -23 [* 800 = -18400 ] filp -92 [* 256 = -23552 ] dentry 179 [* 192 = 34368 ] lsm_file_cache -3 [* 32 = -96 ] vm_area_struct -13 [* 184 = -2392 ] trace_event_file 1748 [* 88 = 153824 ] kmalloc-1k -49 [* 1024 = -50176 ] kmalloc-256 -27 [* 256 = -6912 ] kmalloc-128 1864 [* 128 = 238592 ] kmalloc-64 4685 [* 64 = 299840 ] kmalloc-32 -72 [* 32 = -2304 ] kmalloc-16 256 [* 16 = 4096 ] total = 721352 Free memory dropped by 12,084 kB Available memory dropped by 11,976 kB Total slab additions in size: 721,352 bytes That's over 2 MB in savings per instance for free and available memory, and over 1 MB in savings per instance of slab memory. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Ajay Kaher <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-10-04rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREPFrederic Weisbecker1-0/+7
The callbacks migration is performed through an explicit call from the hotplug control CPU right after the death of the target CPU and before proceeding with the CPUHP_ teardown functions. This is unusual but necessary and yet uncommented. Summarize the reason as explained in the changelog of: a58163d8ca2c (rcu: Migrate callbacks earlier in the CPU-offline timeline) Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2023-10-04rcu: Standardize explicit CPU-hotplug callsFrederic Weisbecker2-7/+11
rcu_report_dead() and rcutree_migrate_callbacks() have their headers in rcupdate.h while those are pure rcutree calls, like the other CPU-hotplug functions. Also rcu_cpu_starting() and rcu_report_dead() have different naming conventions while they mirror each other's effects. Fix the headers and propose a naming that relates both functions and aligns with the prefix of other rcutree CPU-hotplug functions. Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2023-10-04rcu: Conditionally build CPU-hotplug teardown callbacksFrederic Weisbecker1-60/+54
Among the three CPU-hotplug teardown RCU callbacks, two of them early exit if CONFIG_HOTPLUG_CPU=n, and one is left unchanged. In any case all of them have an implementation when CONFIG_HOTPLUG_CPU=n. Align instead with the common way to deal with CPU-hotplug teardown callbacks and provide a proper stub when they are not supported. Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2023-10-04workqueue: Fix UAF report by KASAN in pwq_release_workfn()Zqiang1-0/+6
Currently, for UNBOUND wq, if the apply_wqattrs_prepare() return error, the apply_wqattr_cleanup() will be called and use the pwq_release_worker kthread to release resources asynchronously. however, the kfree(wq) is invoked directly in failure path of alloc_workqueue(), if the kfree(wq) has been executed and when the pwq_release_workfn() accesses wq, this leads to the following scenario: BUG: KASAN: slab-use-after-free in pwq_release_workfn+0x339/0x380 kernel/workqueue.c:4124 Read of size 4 at addr ffff888027b831c0 by task pool_workqueue_/3 CPU: 0 PID: 3 Comm: pool_workqueue_ Not tainted 6.5.0-rc7-next-20230825-syzkaller #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:364 [inline] print_report+0xc4/0x620 mm/kasan/report.c:475 kasan_report+0xda/0x110 mm/kasan/report.c:588 pwq_release_workfn+0x339/0x380 kernel/workqueue.c:4124 kthread_worker_fn+0x2fc/0xa80 kernel/kthread.c:823 kthread+0x33a/0x430 kernel/kthread.c:388 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 </TASK> Allocated by task 5054: kasan_save_stack+0x33/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:374 [inline] __kasan_kmalloc+0xa2/0xb0 mm/kasan/common.c:383 kmalloc include/linux/slab.h:599 [inline] kzalloc include/linux/slab.h:720 [inline] alloc_workqueue+0x16f/0x1490 kernel/workqueue.c:4684 kvm_mmu_init_tdp_mmu+0x23/0x100 arch/x86/kvm/mmu/tdp_mmu.c:19 kvm_mmu_init_vm+0x248/0x2e0 arch/x86/kvm/mmu/mmu.c:6180 kvm_arch_init_vm+0x39/0x720 arch/x86/kvm/x86.c:12311 kvm_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1222 [inline] kvm_dev_ioctl_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:5089 [inline] kvm_dev_ioctl+0xa31/0x1c20 arch/x86/kvm/../../../virt/kvm/kvm_main.c:5131 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl fs/ioctl.c:857 [inline] __x64_sys_ioctl+0x18f/0x210 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 5054: kasan_save_stack+0x33/0x50 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:522 ____kasan_slab_free mm/kasan/common.c:236 [inline] ____kasan_slab_free+0x15b/0x1b0 mm/kasan/common.c:200 kasan_slab_free include/linux/kasan.h:164 [inline] slab_free_hook mm/slub.c:1800 [inline] slab_free_freelist_hook+0x114/0x1e0 mm/slub.c:1826 slab_free mm/slub.c:3809 [inline] __kmem_cache_free+0xb8/0x2f0 mm/slub.c:3822 alloc_workqueue+0xe76/0x1490 kernel/workqueue.c:4746 kvm_mmu_init_tdp_mmu+0x23/0x100 arch/x86/kvm/mmu/tdp_mmu.c:19 kvm_mmu_init_vm+0x248/0x2e0 arch/x86/kvm/mmu/mmu.c:6180 kvm_arch_init_vm+0x39/0x720 arch/x86/kvm/x86.c:12311 kvm_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1222 [inline] kvm_dev_ioctl_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:5089 [inline] kvm_dev_ioctl+0xa31/0x1c20 arch/x86/kvm/../../../virt/kvm/kvm_main.c:5131 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:871 [inline] __se_sys_ioctl fs/ioctl.c:857 [inline] __x64_sys_ioctl+0x18f/0x210 fs/ioctl.c:857 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd This commit therefore flush pwq_release_worker in the alloc_and_link_pwqs() before invoke kfree(wq). Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=60db9f652c92d5bacba4 Signed-off-by: Zqiang <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2023-10-04cgroup/cpuset: Cleanup signedness issue in cpu_exclusive_check()Harshit Mogalapalli1-7/+7
Smatch complains about returning negative error codes from a type bool function. kernel/cgroup/cpuset.c:705 cpu_exclusive_check() warn: signedness bug returning '(-22)' The code works correctly, but it is confusing. The current behavior is that cpu_exclusive_check() returns true if it's *NOT* exclusive. Rename it to cpusets_are_exclusive() and reverse the returns so it returns true if it is exclusive and false if it's not. Update both callers as well. Reported-by: kernel test robot <[email protected]> Reported-by: Dan Carpenter <[email protected]> Closes: https://lore.kernel.org/r/[email protected]/ Signed-off-by: Harshit Mogalapalli <[email protected]> Reviewed-by: Kamalesh Babulal <[email protected]> Acked-by: Waiman Long <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2023-10-04cgroup/cpuset: Enable invalid to valid local partition transitionWaiman Long1-31/+48
When a local partition becomes invalid, it won't transition back to valid partition automatically if a proper "cpuset.cpus.exclusive" or "cpuset.cpus" change is made. Instead, system administrators have to explicitly echo "root" or "isolated" into the "cpuset.cpus.partition" file at the partition root. This patch now enables the automatic transition of an invalid local partition back to valid when there is a proper "cpuset.cpus.exclusive" or "cpuset.cpus" change. Automatic transition of an invalid remote partition to a valid one, however, is not covered by this patch. They still need an explicit write to "cpuset.cpus.partition" to become valid again. The test_cpuset_prs.sh test script is updated to add new test cases to test this automatic state transition. Reported-by: Pierre Gondois <[email protected]> Link: https://lore.kernel.org/lkml/[email protected] Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2023-10-04cgroup: add cgroup_favordynmods= command-line optionLuiz Capitulino1-4/+14
We have a need of using favordynmods with cgroup v1, which doesn't support changing mount flags during remount. Enabling CONFIG_CGROUP_FAVOR_DYNMODS at build-time is not an option because we want to be able to selectively enable it for certain systems. This commit addresses this by introducing the cgroup_favordynmods= command-line option. This option works for both cgroup v1 and v2 and also allows for disabling favorynmods when the kernel built with CONFIG_CGROUP_FAVOR_DYNMODS=y. Also, note that when cgroup_favordynmods=true favordynmods is never disabled in cgroup_destroy_root(). Signed-off-by: Luiz Capitulino <[email protected]> Reviewed-by: Michal Koutný <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2023-10-04PM: hibernate: Fix copying the zero bitmap to safe pagesPavankumar Kondeti1-2/+2
The following crash is observed 100% of the time during resume from the hibernation on a x86 QEMU system. [ 12.931887] ? __die_body+0x1a/0x60 [ 12.932324] ? page_fault_oops+0x156/0x420 [ 12.932824] ? search_exception_tables+0x37/0x50 [ 12.933389] ? fixup_exception+0x21/0x300 [ 12.933889] ? exc_page_fault+0x69/0x150 [ 12.934371] ? asm_exc_page_fault+0x26/0x30 [ 12.934869] ? get_buffer.constprop.0+0xac/0x100 [ 12.935428] snapshot_write_next+0x7c/0x9f0 [ 12.935929] ? submit_bio_noacct_nocheck+0x2c2/0x370 [ 12.936530] ? submit_bio_noacct+0x44/0x2c0 [ 12.937035] ? hib_submit_io+0xa5/0x110 [ 12.937501] load_image+0x83/0x1a0 [ 12.937919] swsusp_read+0x17f/0x1d0 [ 12.938355] ? create_basic_memory_bitmaps+0x1b7/0x240 [ 12.938967] load_image_and_restore+0x45/0xc0 [ 12.939494] software_resume+0x13c/0x180 [ 12.939994] resume_store+0xa3/0x1d0 The commit being fixed introduced a bug in copying the zero bitmap to safe pages. A temporary bitmap is allocated with PG_ANY flag in prepare_image() to make a copy of zero bitmap after the unsafe pages are marked. Freeing this temporary bitmap with PG_UNSAFE_KEEP later results in an inconsistent state of unsafe pages. Since free bit is left as is for this temporary bitmap after free, these pages are treated as unsafe pages when they are allocated again. This results in incorrect calculation of the number of pages pre-allocated for the image. nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages; The allocate_unsafe_pages is estimated to be higher than the actual which results in running short of pages in safe_pages_list. Hence the crash is observed in get_buffer() due to NULL pointer access of safe_pages_list. Fix this issue by creating the temporary zero bitmap from safe pages (free bit not set) so that the corresponding free bits can be cleared while freeing this bitmap. Fixes: 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file") Suggested-by:: Brian Geffon <[email protected]> Signed-off-by: Pavankumar Kondeti <[email protected]> Reviewed-by: Brian Geffon <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-10-04crash_core.c: remove unneeded functionsBaoquan He1-18/+0
So far, nobody calls functions parse_crashkernel_high() and parse_crashkernel_low(), remove both of them. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chen Jiahao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04crash_core: move crashk_*res definition into crash_core.cBaoquan He2-17/+16
Both crashk_res and crashk_low_res are used to mark the reserved crashkernel regions in iomem_resource tree. And later the generic crashkernel resrvation will be added into crash_core.c. So move crashk_res and crashk_low_res definition into crash_core.c to avoid compiling error if CONFIG_CRASH_CORE=on while CONFIG_KEXEC_CORE is unset. Meanwhile include <asm/crash_core.h> in <linux/crash_core.h> if generic reservation is needed. In that case, <asm/crash_core.h> need be added by ARCH. In asm/crash_core.h, ARCH can provide its own macro definitions to override macros in <linux/crash_core.h> if needed. Wrap the including into CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION ifdeffery scope to avoid compiling error in other ARCH-es which don't take the generic reservation way yet. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chen Jiahao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04crash_core: add generic function to do reservationBaoquan He1-1/+106
In architecture like x86_64, arm64 and riscv, they have vast virtual address space and usually have huge physical memory RAM. Their crashkernel reservation doesn't have to be limited under 4G RAM, but can be extended to the whole physical memory via crashkernel=,high support. Now add function reserve_crashkernel_generic() to reserve crashkernel memory if users specify any case of kernel pamameters, like crashkernel=xM[@offset] or crashkernel=,high|low. This is preparation to simplify code of crashkernel=,high support in architecutures. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chen Jiahao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04crash_core: change parse_crashkernel() to support crashkernel=,high|low parsingBaoquan He1-3/+33
Now parse_crashkernel() is a real entry point for all kinds of crahskernel parsing on any architecture. And wrap the crahskernel=,high|low handling inside CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION ifdeffery scope. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chen Jiahao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04crash_core: change the prototype of function parse_crashkernel()Baoquan He1-3/+12
Add two parameters 'low_size' and 'high' to function parse_crashkernel(), later crashkernel=,high|low parsing will be added. Make adjustments in all call sites of parse_crashkernel() in arch. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chen Jiahao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04crash_core.c: remove unnecessary parameter of functionBaoquan He1-4/+4
Patch series "kdump: use generic functions to simplify crashkernel reservation in arch", v3. In the current arm64, crashkernel=,high support has been finished after several rounds of posting and careful reviewing. The code in arm64 which parses crashkernel kernel parameters firstly, then reserve memory can be a good example for other ARCH to refer to. Whereas in x86_64, the code mixing crashkernel parameter parsing and memory reserving is twisted, and looks messy. Refactoring the code to make it more readable maintainable is necessary. Here, firstly abstract the crashkernel parameter parsing code into parse_crashkernel() to make it be able to parse crashkernel=,high|low. Then abstract the crashkernel memory reserving code into a generic function reserve_crashkernel_generic(). Finally, in ARCH which crashkernel=,high support is needed, a simple arch_reserve_crashkernel() can be added to call above two functions. This can remove the duplicated implmentation code in each ARCH, like arm64, x86_64 and riscv. crashkernel=512M,high crashkernel=512M,high crashkernel=256M,low crashkernel=512M,high crashkernel=0M,low crashkernel=0M,high crashkernel=256M,low crashkernel=512M crashkernel=512M@0x4f000000 crashkernel=1G-4G:256M,4G-64G:320M,64G-:576M crashkernel=0M This patch (of 9): In all call sites of __parse_crashkernel(), the parameter 'name' is hardcoded as "crashkernel=". So remove the unnecessary parameter 'name', add local varibale 'name' inside __parse_crashkernel() instead. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chen Jiahao <[email protected]> Cc: Zhen Lei <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04pid: pid_ns_ctl_handler: remove useless commentRong Tao1-6/+0
commit 95846ecf9dac("pid: replace pid bitmap implementation with IDR API") removes 'last_pid' element, and use the idr_get_cursor-idr_set_cursor pair to set the value of idr, so useless comments should be removed. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Rong Tao <[email protected]> Cc: Aleksa Sarai <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jeff Xu <[email protected]> Cc: Kees Cook <[email protected]> Cc: Luis Chamberlain <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04kthread: add kthread_stop_putAndreas Gruenbacher3-12/+24
Add a kthread_stop_put() helper that stops a thread and puts its task struct. Use it to replace the various instances of kthread_stop() followed by put_task_struct(). Remove the kthread_stop_put() macro in usbip that is similar but doesn't return the result of kthread_stop(). [[email protected]: fix kerneldoc comment] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: document kthread_stop_put()'s argument] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andreas Gruenbacher <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04taskstats: fill_stats_for_tgid: use for_each_thread()Oleg Nesterov1-3/+2
do/while_each_thread should be avoided when possible. Plus I _think_ this change allows to avoid lock_task_sighand() but I am not sure, I forgot everything about taskstats. In any case, this code does not look right in that the same thread can be accounted twice: taskstats_exit() can account the exiting thread in signal->stats and drop ->siglock but this thread is still on the thread-group list, so lock_task_sighand() can't help. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Cc: Eric W. Biederman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04getrusage: use __for_each_thread()Oleg Nesterov1-3/+1
do/while_each_thread should be avoided when possible. Plus this change allows to avoid lock_task_sighand(), we can use rcu and/or sig->stats_lock instead. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Cc: Eric W. Biederman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04getrusage: add the "signal_struct *sig" local variableOleg Nesterov1-18/+19
No functional changes, cleanup/preparation. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Cc: Eric W. Biederman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04signal: complete_signal: use __for_each_thread()Oleg Nesterov1-3/+2
do/while_each_thread should be avoided when possible. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Cc: Eric W. Biederman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04panic: use atomic_try_cmpxchg in panic() and nmi_panic()Uros Bizjak1-9/+13
Use atomic_try_cmpxchg instead of atomic_cmpxchg (*ptr, old, new) == old in panic() and nmi_panic(). x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, rename cpu variable to this_cpu in nmi_panic() and try to unify logic flow between panic() and nmi_panic(). No functional change intended. [[email protected]: clean up if/else block] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Uros Bizjak <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04__kill_pgrp_info: simplify the calculation of return valueOleg Nesterov1-6/+11
No need to calculate/check the "success" variable, we can kill it and update retval in the main loop unless it is zero. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Suggested-by: David Laight <[email protected]> Cc: Eric W. Biederman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04kill task_struct->thread_groupOleg Nesterov2-4/+0
The last user was removed by the previous patch. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04docs: fix link s390/zfcpdump.rstCosta Shulyupin1-1/+1
After move of Documentation/s390 to Documentation/arch/s390 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Costa Shulyupin <[email protected]> Cc: Baoquan He <[email protected]> Cc: Eric DeVolder <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Sourabh Jain <[email protected]> Cc: Heiko Carstens <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04rcu: dynamically allocate the rcu-kfree shrinkerQi Zheng1-9/+12
Use new APIs to dynamically allocate the rcu-kfree shrinker. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qi Zheng <[email protected]> Reviewed-by: Joel Fernandes (Google) <[email protected]> Reviewed-by: Muchun Song <[email protected]> Cc: Abhinav Kumar <[email protected]> Cc: Alasdair Kergon <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alyssa Rosenzweig <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Andreas Gruenbacher <[email protected]> Cc: Anna Schumaker <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Bob Peterson <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Carlos Llamas <[email protected]> Cc: Chandan Babu R <[email protected]> Cc: Chao Yu <[email protected]> Cc: Chris Mason <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Christian Koenig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Coly Li <[email protected]> Cc: Dai Ngo <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: "Darrick J. Wong" <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Airlie <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Sterba <[email protected]> Cc: Dmitry Baryshkov <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Huang Rui <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jeff Layton <[email protected]> Cc: Jeffle Xu <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Marijn Suijten <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Mike Snitzer <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Muchun Song <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Neil Brown <[email protected]> Cc: Oleksandr Tyshchenko <[email protected]> Cc: Olga Kornievskaia <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rob Clark <[email protected]> Cc: Rob Herring <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Sean Paul <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Song Liu <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Steven Price <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tomeu Vizoso <[email protected]> Cc: Tom Talpey <[email protected]> Cc: Trond Myklebust <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Xuan Zhuo <[email protected]> Cc: Yue Hu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04rcu: dynamically allocate the rcu-lazy shrinkerQi Zheng1-9/+10
Use new APIs to dynamically allocate the rcu-lazy shrinker. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qi Zheng <[email protected]> Reviewed-by: Joel Fernandes (Google) <[email protected]> Acked-by: Muchun Song <[email protected]> Cc: Abhinav Kumar <[email protected]> Cc: Alasdair Kergon <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Alyssa Rosenzweig <[email protected]> Cc: Andreas Dilger <[email protected]> Cc: Andreas Gruenbacher <[email protected]> Cc: Anna Schumaker <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Bob Peterson <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Carlos Llamas <[email protected]> Cc: Chandan Babu R <[email protected]> Cc: Chao Yu <[email protected]> Cc: Chris Mason <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Christian Koenig <[email protected]> Cc: Chuck Lever <[email protected]> Cc: Coly Li <[email protected]> Cc: Dai Ngo <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: "Darrick J. Wong" <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Airlie <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Sterba <[email protected]> Cc: Dmitry Baryshkov <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Huang Rui <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jaegeuk Kim <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Jan Kara <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jeff Layton <[email protected]> Cc: Jeffle Xu <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Marijn Suijten <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Mike Snitzer <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Neil Brown <[email protected]> Cc: Oleksandr Tyshchenko <[email protected]> Cc: Olga Kornievskaia <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rob Clark <[email protected]> Cc: Rob Herring <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Sean Paul <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Song Liu <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Steven Price <[email protected]> Cc: "Theodore Ts'o" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tomeu Vizoso <[email protected]> Cc: Tom Talpey <[email protected]> Cc: Trond Myklebust <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Xuan Zhuo <[email protected]> Cc: Yue Hu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04mm: remove remnants of SPLIT_RSS_COUNTINGMateusz Guzik3-9/+0
The feature got retired in f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter"), but the patch failed to fully clean it up. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mateusz Guzik <[email protected]> Acked-by: Shakeel Butt <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-04futex/requeue: Remove unnecessary ‘NULL’ initialization from ↵Li zeming1-1/+1
futex_proxy_trylock_atomic() 'top_waiter' is assigned unconditionally before first use, so it does not need an initialization. [ mingo: Created legible changelog. ] Signed-off-by: Li zeming <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-04rcu: Assume rcu_report_dead() is always called locallyFrederic Weisbecker2-3/+3
rcu_report_dead() has to be called locally by the CPU that is going to exit the RCU state machine. Passing a cpu argument here is error-prone and leaves the possibility for a racy remote call. Use local access instead. Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2023-10-04rcu: Assume IRQS disabled from rcu_report_dead()Frederic Weisbecker1-4/+6
rcu_report_dead() is the last RCU word from the CPU down through the hotplug path. It is called in the idle loop right before the CPU shuts down for good. Because it removes the CPU from the grace period state machine and reports an ultimate quiescent state if necessary, no further use of RCU is allowed. Therefore it is expected that IRQs are disabled upon calling this function and are not to be re-enabled again until the CPU shuts down. Remove the IRQs disablement from that function and verify instead that it is actually called with IRQs disabled as it is expected at that special point in the idle path. Reviewed-by: Joel Fernandes (Google) <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2023-10-04rcu: Use rcu_segcblist_segempty() instead of open coding itFrederic Weisbecker1-2/+2
This makes the code more readable. Reviewed-by: Qiuxu Zhuo <[email protected]> Reviewed-by: Joel Fernandes (Google) <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2023-10-04rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objectsCatalin Marinas1-0/+9
Since the actual slab freeing is deferred when calling kvfree_rcu(), so is the kmemleak_free() callback informing kmemleak of the object deletion. From the perspective of the kvfree_rcu() caller, the object is freed and it may remove any references to it. Since kmemleak does not scan RCU internal data storing the pointer, it will report such objects as leaks during the grace period. Tell kmemleak to ignore such objects on the kvfree_call_rcu() path. Note that the tiny RCU implementation does not have such issue since the objects can be tracked from the rcu_ctrlblk structure. Signed-off-by: Catalin Marinas <[email protected]> Reported-by: Christoph Paasch <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Cc: <[email protected]> Tested-by: Christoph Paasch <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Joel Fernandes (Google) <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2023-10-04Merge tag 'for-netdev' of ↵Jakub Kicinski3-30/+25
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-10-02 We've added 11 non-merge commits during the last 12 day(s) which contain a total of 12 files changed, 176 insertions(+), 41 deletions(-). The main changes are: 1) Fix BPF verifier to reset backtrack_state masks on global function exit as otherwise subsequent precision tracking would reuse them, from Andrii Nakryiko. 2) Several sockmap fixes for available bytes accounting, from John Fastabend. 3) Reject sk_msg egress redirects to non-TCP sockets given this is only supported for TCP sockets today, from Jakub Sitnicki. 4) Fix a syzkaller splat in bpf_mprog when hitting maximum program limits with BPF_F_BEFORE directive, from Daniel Borkmann and Nikolay Aleksandrov. 5) Fix BPF memory allocator to use kmalloc_size_roundup() to adjust size_index for selecting a bpf_mem_cache, from Hou Tao. 6) Fix arch_prepare_bpf_trampoline return code for s390 JIT, from Song Liu. 7) Fix bpf_trampoline_get when CONFIG_BPF_JIT is turned off, from Leon Hwang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Use kmalloc_size_roundup() to adjust size_index selftest/bpf: Add various selftests for program limits bpf, mprog: Fix maximum program check on mprog attachment bpf, sockmap: Reject sk_msg egress redirects to non-TCP sockets bpf, sockmap: Add tests for MSG_F_PEEK bpf, sockmap: Do not inc copied_seq when PEEK flag set bpf: tcp_read_skb needs to pop skb regardless of seq bpf: unconditionally reset backtrack_state masks on global func exit bpf: Fix tr dereferencing selftests/bpf: Check bpf_cubic_acked() is called via struct_ops s390/bpf: Let arch_prepare_bpf_trampoline return program size ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-03tracing/user_events: Allow events to persist for perfmon_capable usersBeau Belgrave1-14/+22
There are several scenarios that have come up where having a user_event persist even if the process that registered it exits. The main one is having a daemon create events on bootup that shouldn't get deleted if the daemon has to exit or reload. Another is within OpenTelemetry exporters, they wish to potentially check if a user_event exists on the system to determine if exporting the data out should occur. The user_event in this case must exist even in the absence of the owning process running (such as the above daemon case). Expose the previously internal flag USER_EVENT_REG_PERSIST to user processes. Upon register or delete of events with this flag, ensure the user is perfmon_capable to prevent random user processes with access to tracefs from creating events that persist after exit. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-10-03ring_buffer: Use try_cmpxchg instead of cmpxchg in rb_insert_pagesUros Bizjak1-4/+4
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in rb_insert_pages. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). No functional change intended. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Steven Rostedt <[email protected]> Cc: Masami Hiramatsu <[email protected]> Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-10-03tracing: Expand all ring buffers individuallyZheng Yejian3-33/+45
The ring buffer of global_trace is set to the minimum size in order to save memory on boot up and then it will be expand when some trace feature enabled. However currently operations under an instance can also cause global_trace ring buffer being expanded, and the expanded memory would be wasted if global_trace then not being used. See following case, we enable 'sched_switch' event in instance 'A', then ring buffer of global_trace is unexpectedly expanded to be 1410KB, also the '(expanded: 1408)' from 'buffer_size_kb' of instance is confusing. # cd /sys/kernel/tracing # mkdir instances/A # cat buffer_size_kb 7 (expanded: 1408) # cat instances/A/buffer_size_kb 1410 (expanded: 1408) # echo sched:sched_switch > instances/A/set_event # cat buffer_size_kb 1410 # cat instances/A/buffer_size_kb 1410 To fix it, we can: - Make 'ring_buffer_expanded' as a member of 'struct trace_array'; - Make 'ring_buffer_expanded' of instance is defaultly true, global_trace is defaultly false; - In order not to expose 'global_trace' outside of file 'kernel/trace/trace.c', introduce trace_set_ring_buffer_expanded() to set 'ring_buffer_expanded' as 'true'; - Pass the expected trace_array to tracing_update_buffers(). Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Signed-off-by: Zheng Yejian <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-10-03sched/headers: Remove duplicate header inclusionsYu Liao2-2/+0
<linux/psi.h> and "autogroup.h" are included twice, remove the duplicate header inclusion. Signed-off-by: Yu Liao <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-03syscalls: Cleanup references to sys_lookup_dcookie()Sohil Mehta1-2/+0
commit 'be65de6b03aa ("fs: Remove dcookies support")' removed the syscall definition for lookup_dcookie. However, syscall tables still point to the old sys_lookup_dcookie() definition. Update syscall tables of all architectures to directly point to sys_ni_syscall() instead. Signed-off-by: Sohil Mehta <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Acked-by: Namhyung Kim <[email protected]> # for perf Acked-by: Russell King (Oracle) <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2023-10-03sched/eevdf: Fix avg_vruntime()Peter Zijlstra1-1/+9
The expectation is that placing a task at avg_vruntime() makes it eligible. Turns out there is a corner case where this is not the case. Specifically, avg_vruntime() relies on the fact that integer division is a flooring function (eg. it discards the remainder). By this property the value returned is slightly left of the true average. However! when the average is a negative (relative to min_vruntime) the effect is flipped and it becomes a ceil, with the result that the returned value is just right of the average and thus not eligible. Fixes: af4cf40470c2 ("sched/fair: Add cfs_rq::avg_vruntime") Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
2023-10-03sched/eevdf: Also update slice on placementPeter Zijlstra1-2/+4
Tasks that never consume their full slice would not update their slice value. This means that tasks that are spawned before the sysctl scaling keep their original (UP) slice length. Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-10-03locking/debug: Fix debugfs API return value checks to use IS_ERR()Atul Kumar Pant1-5/+5
Update the checking of return values from debugfs_create_file() and debugfs_create_dir() to use IS_ERR(). Signed-off-by: Atul Kumar Pant <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Waiman Long <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-03Merge tag 'v6.6-rc4' into perf/core, to pick up fixesIngo Molnar131-2829/+7423
Signed-off-by: Ingo Molnar <[email protected]>
2023-10-02sched/rt: Disallow writing invalid values to sched_rt_period_usCyril Hrubis1-4/+5
The validation of the value written to sched_rt_period_us was broken because: - the sysclt_sched_rt_period is declared as unsigned int - parsed by proc_do_intvec() - the range is asserted after the value parsed by proc_do_intvec() Because of this negative values written to the file were written into a unsigned integer that were later on interpreted as large positive integers which did passed the check: if (sysclt_sched_rt_period <= 0) return EINVAL; This commit fixes the parsing by setting explicit range for both perid_us and runtime_us into the sched_rt_sysctls table and processes the values with proc_dointvec_minmax() instead. Alternatively if we wanted to use full range of unsigned int for the period value we would have to split the proc_handler and use proc_douintvec() for it however even the Documentation/scheduller/sched-rt-group.rst describes the range as 1 to INT_MAX. As far as I can tell the only problem this causes is that the sysctl file allows writing negative values which when read back may confuse userspace. There is also a LTP test being submitted for these sysctl files at: http://patchwork.ozlabs.org/project/ltp/patch/[email protected]/ Signed-off-by: Cyril Hrubis <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-10-01Merge tag 'mm-hotfixes-stable-2023-10-01-08-34' of ↵Linus Torvalds1-0/+17
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Fourteen hotfixes, eleven of which are cc:stable. The remainder pertain to issues which were introduced after 6.5" * tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: Crash: add lock to serialize crash hotplug handling selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions() mm, memcg: reconsider kmem.limit_in_bytes deprecation mm: zswap: fix potential memory corruption on duplicate store arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries mm: hugetlb: add huge page size param to set_huge_pte_at() maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states maple_tree: add mas_is_active() to detect in-tree walks nilfs2: fix potential use after free in nilfs_gccache_submit_read_data() mm: abstract moving to the next PFN mm: report success more often from filemap_map_folio_range() fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
2023-10-01Merge tag 'sched-urgent-2023-10-01' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix a RT tasks related lockup/live-lock during CPU offlining" * tag 'sched-urgent-2023-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/rt: Fix live lock between select_fallback_rq() and RT push
2023-09-30Merge tag 'trace-v6.6-rc3' of ↵Linus Torvalds3-7/+55
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Make sure 32-bit applications using user events have aligned access when running on a 64-bit kernel. - Add cond_resched in the loop that handles converting enums in print_fmt string is trace events. - Fix premature wake ups of polling processes in the tracing ring buffer. When a task polls waiting for a percentage of the ring buffer to be filled, the writer still will wake it up at every event. Add the polling's percentage to the "shortest_full" list to tell the writer when to wake it up. - For eventfs dir lookups on dynamic events, an event system's only event could be removed, leaving its dentry with no children. This is totally legitimate. But in eventfs_release() it must not access the children array, as it is only allocated when the dentry has children. * tag 'trace-v6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: eventfs: Test for dentries array allocated in eventfs_release() tracing/user_events: Align set_bit() address for all archs tracing: relax trace_event_eval_update() execution with cond_resched() ring-buffer: Update "shortest_full" in polling
2023-09-30tracing/user_events: Align set_bit() address for all archsBeau Belgrave1-7/+51
All architectures should use a long aligned address passed to set_bit(). User processes can pass either a 32-bit or 64-bit sized value to be updated when tracing is enabled when on a 64-bit kernel. Both cases are ensured to be naturally aligned, however, that is not enough. The address must be long aligned without affecting checks on the value within the user process which require different adjustments for the bit for little and big endian CPUs. Add a compat flag to user_event_enabler that indicates when a 32-bit value is being used on a 64-bit kernel. Long align addresses and correct the bit to be used by set_bit() to account for this alignment. Ensure compat flags are copied during forks and used during deletion clears. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Link: https://lore.kernel.org/linux-trace-kernel/[email protected]/ Cc: [email protected] Fixes: 7235759084a4 ("tracing/user_events: Use remote writes for event enablement") Reported-by: Clément Léger <[email protected]> Suggested-by: Clément Léger <[email protected]> Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-09-30tracing: relax trace_event_eval_update() execution with cond_resched()Clément Léger1-0/+1
When kernel is compiled without preemption, the eval_map_work_func() (which calls trace_event_eval_update()) will not be preempted up to its complete execution. This can actually cause a problem since if another CPU call stop_machine(), the call will have to wait for the eval_map_work_func() function to finish executing in the workqueue before being able to be scheduled. This problem was observe on a SMP system at boot time, when the CPU calling the initcalls executed clocksource_done_booting() which in the end calls stop_machine(). We observed a 1 second delay because one CPU was executing eval_map_work_func() and was not preempted by the stop_machine() task. Adding a call to cond_resched() in trace_event_eval_update() allows other tasks to be executed and thus continue working asynchronously like before without blocking any pending task at boot time. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Signed-off-by: Clément Léger <[email protected]> Tested-by: Atish Patra <[email protected]> Reviewed-by: Atish Patra <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>