aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2023-12-29kernel: relay: remove relay_file_splice_read dead code, doesn't workAhelenia Ziemiańska1-162/+0
Documentation/filesystems/relay.rst says to use return debugfs_create_file(filename, mode, parent, buf, &relay_file_operations); and this is the only way relay_file_operations is used. Thus: debugfs_create_file(&relay_file_operations) -> __debugfs_create_file(&debugfs_full_proxy_file_operations, &relay_file_operations) -> dentry{inode: {i_fop: &debugfs_full_proxy_file_operations}, d_fsdata: &relay_file_operations | DEBUGFS_FSDATA_IS_REAL_FOPS_BIT} debugfs_full_proxy_file_operations.open is full_proxy_open, which extracts the &relay_file_operations from the dentry, and allocates via __full_proxy_fops_init() new fops, with trivial wrappers around release, llseek, read, write, poll, and unlocked_ioctl, then replaces the fops on the opened file therewith. Naturally, all thusly-created debugfs files have .splice_read = NULL. This was introduced in commit 49d200deaa68 ("debugfs: prevent access to removed files' private data") from 2016-03-22. AFAICT, relay_file_operations is the only struct file_operations used for debugfs which defines a .splice_read callback. Hooking it up with > diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c > index 5063434be0fc..952fcf5b2afa 100644 > --- a/fs/debugfs/file.c > +++ b/fs/debugfs/file.c > @@ -328,6 +328,11 @@ FULL_PROXY_FUNC(write, ssize_t, filp, > loff_t *ppos), > ARGS(filp, buf, size, ppos)); > > +FULL_PROXY_FUNC(splice_read, long, in, > + PROTO(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, > + size_t len, unsigned int flags), > + ARGS(in, ppos, pipe, len, flags)); > + > FULL_PROXY_FUNC(unlocked_ioctl, long, filp, > PROTO(struct file *filp, unsigned int cmd, unsigned long arg), > ARGS(filp, cmd, arg)); > @@ -382,6 +387,8 @@ static void __full_proxy_fops_init(struct file_operations *proxy_fops, > proxy_fops->write = full_proxy_write; > if (real_fops->poll) > proxy_fops->poll = full_proxy_poll; > + if (real_fops->splice_read) > + proxy_fops->splice_read = full_proxy_splice_read; > if (real_fops->unlocked_ioctl) > proxy_fops->unlocked_ioctl = full_proxy_unlocked_ioctl; > } shows it just doesn't work, and splicing always instantly returns empty (subsequent reads actually return the contents). No-one noticed it became dead code in 2016, who knows if it worked back then. Clearly no-one cares; just delete it. Link: https://lkml.kernel.org/r/dtexwpw6zcdx7dkx3xj5gyjp5syxmyretdcbcdtvrnukd4vvuh@tarta.nabijaczleweli.xyz Signed-off-by: Ahelenia Ziemiańska <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Li kunyu <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Zhang Zhengming <[email protected]> Cc: Zhao Lei <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29kexec_file: fix incorrect temp_start value in locate_mem_hole_top_down()Yuntao Wang1-2/+2
temp_end represents the address of the last available byte. Therefore, the starting address of the memory segment with temp_end as its last available byte and a size of `kbuf->memsz`, that is, the value of temp_start, should be `temp_end - kbuf->memsz + 1` instead of `temp_end - kbuf->memsz`. Additionally, use the ALIGN_DOWN macro instead of open-coding it directly in locate_mem_hole_top_down() to improve code readability. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yuntao Wang <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29kexec: modify the meaning of the end parameter in kimage_is_destination_range()Yuntao Wang1-4/+4
The end parameter received by kimage_is_destination_range() should be the last valid byte address of the target memory segment plus 1. However, in the locate_mem_hole_bottom_up() and locate_mem_hole_top_down() functions, the corresponding value passed to kimage_is_destination_range() is the last valid byte address of the target memory segment, which is 1 less. There are two ways to fix this bug. We can either correct the logic of the locate_mem_hole_bottom_up() and locate_mem_hole_top_down() functions, or we can fix kimage_is_destination_range() by making the end parameter represent the last valid byte address of the target memory segment. Here, we choose the second approach. Due to the modification to kimage_is_destination_range(), we also need to adjust its callers, such as kimage_alloc_normal_control_pages() and kimage_alloc_page(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yuntao Wang <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29kernel/events/uprobes: page_remove_rmap() -> folio_remove_rmap_pte()David Hildenbrand1-1/+1
Let's convert __replace_page(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Muchun Song <[email protected]> Cc: Muchun Song <[email protected]> Cc: Peter Xu <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Yin Fengwei <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29mm: remove some calls to page_add_new_anon_rmap()Matthew Wilcox (Oracle)1-1/+1
We already have the folio in these functions, we just need to use it. folio_add_new_anon_rmap() didn't exist at the time they were converted to folios. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-29tracing: Fix blocked reader of snapshot bufferSteven Rostedt (Google)2-4/+19
If an application blocks on the snapshot or snapshot_raw files, expecting to be woken up when a snapshot occurs, it will not happen. Or it may happen with an unexpected result. That result is that the application will be reading the main buffer instead of the snapshot buffer. That is because when the snapshot occurs, the main and snapshot buffers are swapped. But the reader has a descriptor still pointing to the buffer that it originally connected to. This is fine for the main buffer readers, as they may be blocked waiting for a watermark to be hit, and when a snapshot occurs, the data that the main readers want is now on the snapshot buffer. But for waiters of the snapshot buffer, they are waiting for an event to occur that will trigger the snapshot and they can then consume it quickly to save the snapshot before the next snapshot occurs. But to do this, they need to read the new snapshot buffer, not the old one that is now receiving new data. Also, it does not make sense to have a watermark "buffer_percent" on the snapshot buffer, as the snapshot buffer is static and does not receive new data except all at once. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Mathieu Desnoyers <[email protected]> Cc: Mark Rutland <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Fixes: debdd57f5145f ("tracing: Make a snapshot feature available from userspace") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-12-29ring-buffer: Fix wake ups when buffer_percent is set to 100Steven Rostedt (Google)1-2/+7
The tracefs file "buffer_percent" is to allow user space to set a water-mark on how much of the tracing ring buffer needs to be filled in order to wake up a blocked reader. 0 - is to wait until any data is in the buffer 1 - is to wait for 1% of the sub buffers to be filled 50 - would be half of the sub buffers are filled with data 100 - is not to wake the waiter until the ring buffer is completely full Unfortunately the test for being full was: dirty = ring_buffer_nr_dirty_pages(buffer, cpu); return (dirty * 100) > (full * nr_pages); Where "full" is the value for "buffer_percent". There is two issues with the above when full == 100. 1. dirty * 100 > 100 * nr_pages will never be true That is, the above is basically saying that if the user sets buffer_percent to 100, more pages need to be dirty than exist in the ring buffer! 2. The page that the writer is on is never considered dirty, as dirty pages are only those that are full. When the writer goes to a new sub-buffer, it clears the contents of that sub-buffer. That is, even if the check was ">=" it would still not be equal as the most pages that can be considered "dirty" is nr_pages - 1. To fix this, add one to dirty and use ">=" in the compare. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Fixes: 03329f9939781 ("tracing: Add tracefs file buffer_percentage") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-12-29sched/fair: Fix tg->load when offlining a CPUVincent Guittot1-0/+52
When a CPU is taken offline, the contribution of its cfs_rqs to task_groups' load may remain and will negatively impact the calculation of the share of the online CPUs. To fix this bug, clear the contribution of an offlining CPU to task groups' load and skip its contribution while it is inactive. Here's the reproducer of the anomaly, by Imran Khan: "So far I have encountered only one rather lengthy way of reproducing this issue, which is as follows: 1. Take a KVM guest (booted with 4 CPUs and can be scaled up to 124 CPUs) and create 2 custom cgroups: /sys/fs/cgroup/cpu/test_group_1 and /sys/fs/cgroup/ cpu/test_group_2 2. Assign a CPU intensive workload to each of these cgroups and start the workload. For my tests I am using following app: int main(int argc, char *argv[]) { unsigned long count, i, val; if (argc != 2) { printf("usage: ./a.out <number of random nums to generate> \n"); return 0; } count = strtoul(argv[1], NULL, 10); printf("Generating %lu random numbers \n", count); for (i = 0; i < count; i++) { val = rand(); val = val % 2; //usleep(1); } printf("Generated %lu random numbers \n", count); return 0; } Also since the system is booted with 4 CPUs, in order to completely load the system I am also launching 4 instances of same test app under: /sys/fs/cgroup/cpu/ 3. We can see that both of the cgroups get similar CPU time: # systemd-cgtop --depth 1 Path Tasks %CPU Memory Input/s Output/s / 659 - 5.5G - - /system.slice - - 5.7G - - /test_group_1 4 - - - - /test_group_2 3 - - - - /user.slice 31 - 56.5M - - Path Tasks %CPU Memory Input/s Output/s / 659 394.6 5.5G - - /test_group_2 3 65.7 - - - /user.slice 29 55.1 48.0M - - /test_group_1 4 47.3 - - - /system.slice - 2.2 5.7G - - Path Tasks %CPU Memory Input/s Output/s / 659 394.8 5.5G - - /test_group_1 4 62.9 - - - /user.slice 28 44.9 54.2M - - /test_group_2 3 44.7 - - - /system.slice - 0.9 5.7G - - Path Tasks %CPU Memory Input/s Output/s / 659 394.4 5.5G - - /test_group_2 3 58.8 - - - /test_group_1 4 51.9 - - - /user.slice 30 39.3 59.6M - - /system.slice - 1.9 5.7G - - Path Tasks %CPU Memory Input/s Output/s / 659 394.7 5.5G - - /test_group_1 4 60.9 - - - /test_group_2 3 57.9 - - - /user.slice 28 43.5 36.9M - - /system.slice - 3.0 5.7G - - Path Tasks %CPU Memory Input/s Output/s / 659 395.0 5.5G - - /test_group_1 4 66.8 - - - /test_group_2 3 56.3 - - - /user.slice 29 43.1 51.8M - - /system.slice - 0.7 5.7G - - 4. Now move systemd-udevd to one of these test groups, say test_group_1, and perform scale up to 124 CPUs followed by scale down back to 4 CPUs from the host side. 5. Run the same workload i.e 4 instances of CPU hogger under /sys/fs/cgroup/cpu and one instance of CPU hogger each in /sys/fs/cgroup/cpu/test_group_1 and /sys/fs/cgroup/test_group_2. It can be seen that test_group_1 (the one where systemd-udevd was moved) is getting much less CPU time than the test_group_2, even though at this point of time both of these groups have only CPU hogger running: # systemd-cgtop --depth 1 Path Tasks %CPU Memory Input/s Output/s / 1219 - 5.4G - - /system.slice - - 5.6G - - /test_group_1 4 - - - - /test_group_2 3 - - - - /user.slice 26 - 91.3M - - Path Tasks %CPU Memory Input/s Output/s / 1221 394.3 5.4G - - /test_group_2 3 82.7 - - - /test_group_1 4 14.3 - - - /system.slice - 0.8 5.6G - - /user.slice 26 0.4 91.2M - - Path Tasks %CPU Memory Input/s Output/s / 1221 394.6 5.4G - - /test_group_2 3 67.4 - - - /system.slice - 24.6 5.6G - - /test_group_1 4 12.5 - - - /user.slice 26 0.4 91.2M - - Path Tasks %CPU Memory Input/s Output/s / 1221 395.2 5.4G - - /test_group_2 3 60.9 - - - /system.slice - 27.9 5.6G - - /test_group_1 4 12.2 - - - /user.slice 26 0.4 91.2M - - Path Tasks %CPU Memory Input/s Output/s / 1221 395.2 5.4G - - /test_group_2 3 69.4 - - - /test_group_1 4 13.9 - - - /user.slice 28 1.6 92.0M - - /system.slice - 1.0 5.6G - - Path Tasks %CPU Memory Input/s Output/s / 1221 395.6 5.4G - - /test_group_2 3 59.3 - - - /test_group_1 4 14.1 - - - /user.slice 28 1.3 92.2M - - /system.slice - 0.7 5.6G - - Path Tasks %CPU Memory Input/s Output/s / 1221 395.5 5.4G - - /test_group_2 3 67.2 - - - /test_group_1 4 11.5 - - - /user.slice 28 1.3 92.5M - - /system.slice - 0.6 5.6G - - Path Tasks %CPU Memory Input/s Output/s / 1221 395.1 5.4G - - /test_group_2 3 76.8 - - - /test_group_1 4 12.9 - - - /user.slice 28 1.3 92.8M - - /system.slice - 1.2 5.6G - - From sched_debug data it can be seen that in bad case the load.weight of per-CPU sched entities corresponding to test_group_1 has reduced significantly and also load_avg of test_group_1 remains much higher than that of test_group_2, even though systemd-udevd stopped running long time back and at this point of time both cgroups just have the CPU hogger app as running entity." [ mingo: Added details from the original discussion, plus minor edits to the patch. ] Reported-by: Imran Khan <[email protected]> Tested-by: Imran Khan <[email protected]> Tested-by: Aaron Lu <[email protected]> Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Imran Khan <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-27Merge tag 'mm-hotfixes-stable-2023-12-27-15-00' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "11 hotfixes. 7 are cc:stable and the other 4 address post-6.6 issues or are not considered backporting material" * tag 'mm-hotfixes-stable-2023-12-27-15-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mailmap: add an old address for Naoya Horiguchi mm/memory-failure: cast index to loff_t before shifting it mm/memory-failure: check the mapcount of the precise page mm/memory-failure: pass the folio and the page to collect_procs() selftests: secretmem: floor the memory size to the multiple of page_size mm: migrate high-order folios in swap cache correctly maple_tree: do not preallocate nodes for slot stores mm/filemap: avoid buffered read/write race to read inconsistent data kunit: kasan_test: disable fortify string checker on kmalloc_oob_memset kexec: select CRYPTO from KEXEC_FILE instead of depending on it kexec: fix KEXEC_FILE dependencies
2023-12-27Kill sched.h dependency on rcupdate.hKent Overstreet1-0/+1
by moving cond_resched_rcu() to rcupdate_wait.h, we can kill another big sched.h dependency. Signed-off-by: Kent Overstreet <[email protected]>
2023-12-27rseq: Split out rseq.h from sched.hKent Overstreet2-0/+2
We're trying to get sched.h down to more or less just types only, not code - rseq can live in its own header. This helps us kill the dependency on preempt.h in sched.h. Signed-off-by: Kent Overstreet <[email protected]>
2023-12-23sched/fair: Remove unused 'next_buddy_marked' local variable in ↵Wang Jinchao1-2/+0
check_preempt_wakeup_fair() This variable became unused in: 5e963f2bd465 ("sched/fair: Commit to EEVDF") Signed-off-by: Wang Jinchao <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Vincent Guittot <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-23sched/fair: Use all little CPUs for CPU-bound workloadsPierre Gondois1-1/+1
Running N CPU-bound tasks on an N CPUs platform: - with asymmetric CPU capacity - not being a DynamIq system (i.e. having a PKG level sched domain without the SD_SHARE_PKG_RESOURCES flag set) .. might result in a task placement where two tasks run on a big CPU and none on a little CPU. This placement could be more optimal by using all CPUs. Testing platform: Juno-r2: - 2 big CPUs (1-2), maximum capacity of 1024 - 4 little CPUs (0,3-5), maximum capacity of 383 Testing workload ([1]): Spawn 6 CPU-bound tasks. During the first 100ms (step 1), each tasks is affine to a CPU, except for: - one little CPU which is left idle. - one big CPU which has 2 tasks affine. After the 100ms (step 2), remove the cpumask affinity. Behavior before the patch: During step 2, the load balancer running from the idle CPU tags sched domains as: - little CPUs: 'group_has_spare'. Cf. group_has_capacity() and group_is_overloaded(), 3 CPU-bound tasks run on a 4 CPUs sched-domain, and the idle CPU provides enough spare capacity regarding the imbalance_pct - big CPUs: 'group_overloaded'. Indeed, 3 tasks run on a 2 CPUs sched-domain, so the following path is used: group_is_overloaded() \-if (sgs->sum_nr_running <= sgs->group_weight) return true; The following path which would change the migration type to 'migrate_task' is not taken: calculate_imbalance() \-if (env->idle != CPU_NOT_IDLE && env->imbalance == 0) as the local group has some spare capacity, so the imbalance is not 0. The migration type requested is 'migrate_util' and the busiest runqueue is the big CPU's runqueue having 2 tasks (each having a utilization of 512). The idle little CPU cannot pull one of these task as its capacity is too small for the task. The following path is used: detach_tasks() \-case migrate_util: \-if (util > env->imbalance) goto next; After the patch: As the number of failed balancing attempts grows (with 'nr_balance_failed'), progressively make it easier to migrate a big task to the idling little CPU. A similar mechanism is used for the 'migrate_load' migration type. Improvement: Running the testing workload [1] with the step 2 representing a ~10s load for a big CPU: Before patch: ~19.3s After patch: ~18s (-6.7%) Similar issue reported at: https://lore.kernel.org/lkml/[email protected]/ Suggested-by: Vincent Guittot <[email protected]> Signed-off-by: Pierre Gondois <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Vincent Guittot <[email protected]> Reviewed-by: Dietmar Eggemann <[email protected]> Acked-by: Qais Yousef <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-23sched/fair: Simplify util_estVincent Guittot3-57/+36
With UTIL_EST_FASTUP now being permanent, we can take advantage of the fact that the ewma jumps directly to a higher utilization at dequeue to simplify util_est and remove the enqueued field. Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Lukasz Luba <[email protected]> Reviewed-by: Lukasz Luba <[email protected]> Reviewed-by: Dietmar Eggemann <[email protected]> Reviewed-by: Hongyan Xia <[email protected]> Reviewed-by: Alex Shi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-23sched/fair: Remove SCHED_FEAT(UTIL_EST_FASTUP, true)Vincent Guittot2-6/+3
sched_feat(UTIL_EST_FASTUP) has been added to easily disable the feature in order to check for possibly related regressions. After 3 years, it has never been used and no regression has been reported. Let's remove it and make fast increase a permanent behavior. Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Lukasz Luba <[email protected]> Reviewed-by: Lukasz Luba <[email protected]> Reviewed-by: Dietmar Eggemann <[email protected]> Reviewed-by: Hongyan Xia <[email protected]> Reviewed-by: Tang Yizhou <[email protected]> Reviewed-by: Yanteng Si <[email protected]> [for the Chinese translation] Reviewed-by: Alex Shi <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-23cpufreq/schedutil: Use a fixed reference frequencyVincent Guittot1-2/+24
cpuinfo.max_freq can change at runtime because of boost as an example. This implies that the value could be different than the one that has been used when computing the capacity of a CPU. The new arch_scale_freq_ref() returns a fixed and coherent reference frequency that can be used when computing a frequency based on utilization. Use this arch_scale_freq_ref() when available and fallback to policy otherwise. Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Lukasz Luba <[email protected]> Reviewed-by: Lukasz Luba <[email protected]> Reviewed-by: Dietmar Eggemann <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Acked-by: Viresh Kumar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-23Merge tag 'v6.7-rc6' into sched/core, to pick up fixesIngo Molnar26-710/+650
Signed-off-by: Ingo Molnar <[email protected]>
2023-12-21entry: Move syscall_enter_from_user_mode() to header fileSven Schnelle1-31/+1
To allow inlining of syscall_enter_from_user_mode(), move it to entry-common.h. Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-21entry: Move enter_from_user_mode() to header fileSven Schnelle1-23/+3
To allow inlining of enter_from_user_mode(), move it to entry-common.h. Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-21entry: Move exit to usermode functions to header fileSven Schnelle1-43/+9
To allow inlining, move exit_to_user_mode() to entry-common.h. Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-21bpf: Avoid unnecessary use of comma operator in verifierSimon Horman1-1/+1
Although it does not seem to have any untoward side-effects, the use of ';' to separate to assignments seems more appropriate than ','. Flagged by clang-17 -Wcomma No functional change intended. Compile tested only. Signed-off-by: Simon Horman <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Dave Marchevsky <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-12-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni13-413/+144
Cross-merge networking fixes after downstream PR. Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 23c93c3b6275 ("bnxt_en: do not map packet buffers twice") 6d1add95536b ("bnxt_en: Modify TX ring indexing logic.") tools/testing/selftests/net/Makefile 2258b666482d ("selftests: add vlan hw filter tests") a0bc96c0cd6e ("selftests: net: verify fq per-band packet limit") Signed-off-by: Paolo Abeni <[email protected]>
2023-12-21Merge tag 'trace-v6.7-rc6-2' of ↵Linus Torvalds2-2/+13
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix another kerneldoc warning - Fix eventfs files to inherit the ownership of its parent directory. The dynamic creation of dentries in eventfs did not take into account if the tracefs file system was mounted with a gid/uid, and would still default to the gid/uid of root. This is a regression. - Fix warning when synthetic event testing is enabled along with startup event tracing testing is enabled * tag 'trace-v6.7-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing / synthetic: Disable events after testing in synth_event_gen_test_init() eventfs: Have event files and directories default to parent uid and gid tracing/synthetic: fix kernel-doc warnings
2023-12-21ring-buffer: Use subbuf_order for buffer page maskingSteven Rostedt (Google)1-8/+11
The comparisons to PAGE_SIZE were all converted to use the buffer->subbuf_order, but the use of PAGE_MASK was missed. Convert all the PAGE_MASK usages over to: (PAGE_SIZE << cpu_buffer->buffer->subbuf_order) - 1 Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Vincent Donnefort <[email protected]> Cc: Kent Overstreet <[email protected]> Fixes: 139f84002145 ("ring-buffer: Page size per ring buffer") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-12-21tracing: Update subbuffer with kilobytes not page orderSteven Rostedt (Google)1-13/+25
Using page order for deciding what the size of the ring buffer sub buffers are is exposing a bit too much of the implementation. Although the sub buffers are only allocated in orders of pages, allow the user to specify the minimum size of each sub-buffer via kilobytes like they can with the buffer size itself. If the user specifies 3 via: echo 3 > buffer_subbuf_size_kb Then the sub-buffer size will round up to 4kb (on a 4kb page size system). If they specify: echo 6 > buffer_subbuf_size_kb The sub-buffer size will become 8kb. and so on. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Vincent Donnefort <[email protected]> Cc: Kent Overstreet <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-12-21ring-buffer: Just update the subbuffers when changing their allocation orderSteven Rostedt (Google)1-17/+71
The ring_buffer_subbuf_order_set() was creating ring_buffer_per_cpu cpu_buffers with the new subbuffers with the updated order, and if they all successfully were created, then they the ring_buffer's per_cpu buffers would be freed and replaced by them. The problem is that the freed per_cpu buffers contains state that would be lost. Running the following commands: 1. # echo 3 > /sys/kernel/tracing/buffer_subbuf_order 2. # echo 0 > /sys/kernel/tracing/tracing_cpumask 3. # echo 1 > /sys/kernel/tracing/snapshot 4. # echo ff > /sys/kernel/tracing/tracing_cpumask 5. # echo test > /sys/kernel/tracing/trace_marker Would result in: -bash: echo: write error: Bad file descriptor That's because the state of the per_cpu buffers of the snapshot buffer is lost when the order is changed (the order of a freed snapshot buffer goes to 0 to save memory, and when the snapshot buffer is allocated again, it goes back to what the main buffer is). In operation 2, the snapshot buffers were set to "disable" (as all the ring buffers CPUs were disabled). In operation 3, the snapshot is allocated and a call to ring_buffer_subbuf_order_set() replaced the per_cpu buffers losing the "record_disable" count. When it was enabled again, the atomic_dec(&cpu_buffer->record_disable) was decrementing a zero, setting it to -1. Writing 1 into the snapshot would swap the snapshot buffer with the main buffer, so now the main buffer is "disabled", and nothing can write to the ring buffer anymore. Instead of creating new per_cpu buffers and losing the state of the old buffers, basically do what the resize does and just allocate new subbuf pages into the new_pages link list of the per_cpu buffer and if they all succeed, then replace the old sub buffers with the new ones. This keeps the per_cpu buffer descriptor in tact and by doing so, keeps its state. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Vincent Donnefort <[email protected]> Cc: Kent Overstreet <[email protected]> Fixes: f9b94daa542a ("ring-buffer: Set new size of the ring buffer sub page") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-12-21ring-buffer: Keep the same size when updating the orderSteven Rostedt (Google)1-1/+4
The function ring_buffer_subbuf_order_set() just updated the sub-buffers to the new size, but this also changes the size of the buffer in doing so. As the size is determined by nr_pages * subbuf_size. If the subbuf_size is increased without decreasing the nr_pages, this causes the total size of the buffer to increase. This broke the latency tracers as the snapshot needs to be the same size as the main buffer. The size of the snapshot buffer is only expanded when needed, and because the order is still the same, the size becomes out of sync with the main buffer, as the main buffer increased in size without the tracing system knowing. Calculate the nr_pages to allocate with the new subbuf_size to be buffer_size / new_subbuf_size. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Vincent Donnefort <[email protected]> Cc: Kent Overstreet <[email protected]> Fixes: f9b94daa542a ("ring-buffer: Set new size of the ring buffer sub page") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-12-21tracing: Stop the tracing while changing the ring buffer subbuf sizeSteven Rostedt (Google)1-3/+10
Because the main buffer and the snapshot buffer need to be the same for some tracers, otherwise it will fail and disable all tracing, the tracers need to be stopped while updating the sub buffer sizes so that the tracers see the main and snapshot buffers with the same sub buffer size. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Vincent Donnefort <[email protected]> Cc: Kent Overstreet <[email protected]> Fixes: 2808e31ec12e ("ring-buffer: Add interface for configuring trace sub buffer size") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-12-21tracing: Update snapshot order along with main buffer orderSteven Rostedt (Google)1-2/+43
When updating the order of the sub buffers for the main buffer, make sure that if the snapshot buffer exists, that it gets its order updated as well. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Vincent Donnefort <[email protected]> Cc: Kent Overstreet <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-12-21ring-buffer: Make sure the spare sub buffer used for reads has same sizeSteven Rostedt (Google)1-0/+11
Now that the ring buffer specifies the size of its sub buffers, they all need to be the same size. When doing a read, a swap is done with a spare page. Make sure they are the same size before doing the swap, otherwise the read will fail. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Vincent Donnefort <[email protected]> Cc: Kent Overstreet <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-12-21ring-buffer: Do no swap cpu buffers if order is differentSteven Rostedt (Google)1-0/+3
As all the subbuffer order (subbuffer sizes) must be the same throughout the ring buffer, check the order of the buffers that are doing a CPU buffer swap in ring_buffer_swap_cpu() to make sure they are the same. If the are not the same, then fail to do the swap, otherwise the ring buffer will think the CPU buffer has a specific subbuffer size when it does not. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Vincent Donnefort <[email protected]> Cc: Kent Overstreet <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-12-21ring-buffer: Clear pages on error in ring_buffer_subbuf_order_set() failureSteven Rostedt (Google)1-0/+1
On failure to allocate ring buffer pages, the pointer to the CPU buffer pages is freed, but the pages that were allocated previously were not. Make sure they are freed too. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Tzvetomir Stoyanov <[email protected]> Cc: Vincent Donnefort <[email protected]> Cc: Kent Overstreet <[email protected]> Fixes: f9b94daa542a ("tracing: Set new size of the ring buffer sub page") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-12-21tracing / synthetic: Disable events after testing in synth_event_gen_test_init()Steven Rostedt (Google)1-0/+11
The synth_event_gen_test module can be built in, if someone wants to run the tests at boot up and not have to load them. The synth_event_gen_test_init() function creates and enables the synthetic events and runs its tests. The synth_event_gen_test_exit() disables the events it created and destroys the events. If the module is builtin, the events are never disabled. The issue is, the events should be disable after the tests are run. This could be an issue if the rest of the boot up tests are enabled, as they expect the events to be in a known state before testing. That known state happens to be disabled. When CONFIG_SYNTH_EVENT_GEN_TEST=y and CONFIG_EVENT_TRACE_STARTUP_TEST=y a warning will trigger: Running tests on trace events: Testing event create_synth_test: Enabled event during self test! ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1 at kernel/trace/trace_events.c:4150 event_trace_self_tests+0x1c2/0x480 Modules linked in: CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.7.0-rc2-test-00031-gb803d7c664d5-dirty #276 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:event_trace_self_tests+0x1c2/0x480 Code: bb e8 a2 ab 5d fc 48 8d 7b 48 e8 f9 3d 99 fc 48 8b 73 48 40 f6 c6 01 0f 84 d6 fe ff ff 48 c7 c7 20 b6 ad bb e8 7f ab 5d fc 90 <0f> 0b 90 48 89 df e8 d3 3d 99 fc 48 8b 1b 4c 39 f3 0f 85 2c ff ff RSP: 0000:ffffc9000001fdc0 EFLAGS: 00010246 RAX: 0000000000000029 RBX: ffff88810399ca80 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffffb9f19478 RDI: ffff88823c734e64 RBP: ffff88810399f300 R08: 0000000000000000 R09: fffffbfff79eb32a R10: ffffffffbcf59957 R11: 0000000000000001 R12: ffff888104068090 R13: ffffffffbc89f0a0 R14: ffffffffbc8a0f08 R15: 0000000000000078 FS: 0000000000000000(0000) GS:ffff88823c700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001f6282001 CR4: 0000000000170ef0 Call Trace: <TASK> ? __warn+0xa5/0x200 ? event_trace_self_tests+0x1c2/0x480 ? report_bug+0x1f6/0x220 ? handle_bug+0x6f/0x90 ? exc_invalid_op+0x17/0x50 ? asm_exc_invalid_op+0x1a/0x20 ? tracer_preempt_on+0x78/0x1c0 ? event_trace_self_tests+0x1c2/0x480 ? __pfx_event_trace_self_tests_init+0x10/0x10 event_trace_self_tests_init+0x27/0xe0 do_one_initcall+0xd6/0x3c0 ? __pfx_do_one_initcall+0x10/0x10 ? kasan_set_track+0x25/0x30 ? rcu_is_watching+0x38/0x60 kernel_init_freeable+0x324/0x450 ? __pfx_kernel_init+0x10/0x10 kernel_init+0x1f/0x1e0 ? _raw_spin_unlock_irq+0x33/0x50 ret_from_fork+0x34/0x60 ? __pfx_kernel_init+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> This is because the synth_event_gen_test_init() left the synthetic events that it created enabled. By having it disable them after testing, the other selftests will run fine. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: [email protected] Cc: Mathieu Desnoyers <[email protected]> Cc: Tom Zanussi <[email protected]> Fixes: 9fe41efaca084 ("tracing: Add synth event generation test module") Acked-by: Masami Hiramatsu (Google) <[email protected]> Reported-by: Alexander Graf <[email protected]> Tested-by: Alexander Graf <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-12-21bpf: Re-support uid and gid when mounting bpffsDaniel Borkmann1-2/+51
For a clean, conflict-free revert of the token-related patches in commit d17aff807f84 ("Revert BPF token-related functionality"), the bpf fs commit 750e785796bb ("bpf: Support uid and gid when mounting bpffs") was undone temporarily as well. This patch manually re-adds the functionality from the original one back in 750e785796bb, no other functional changes intended. Testing: # mount -t bpf -o uid=65534,gid=65534 bpffs ./foo # ls -la . | grep foo drwxrwxrwt 2 nobody nogroup 0 Dec 20 13:16 foo # mount -t bpf bpffs on /root/foo type bpf (rw,relatime,uid=65534,gid=65534) Also, passing invalid arguments for uid/gid are properly rejected as expected. Fixes: d17aff807f84 ("Revert BPF token-related functionality") Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Cc: Jie Jiang <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/bpf/[email protected]
2023-12-21Merge branch 'vfs.file'Christian Brauner2-2/+2
Bring in the changes to the file infrastructure for this cycle. Mostly cleanups and some performance tweaks. * file: remove __receive_fd() * file: stop exposing receive_fd_user() * fs: replace f_rcuhead with f_task_work * file: remove pointless wrapper * file: s/close_fd_get_file()/file_close_fd()/g * Improve __fget_files_rcu() code generation (and thus __fget_light()) * file: massage cleanup of files that failed to open Signed-off-by: Christian Brauner <[email protected]>
2023-12-21watch_queue: fix kcalloc() arguments orderDmitry Antipov1-1/+1
When compiling with gcc version 14.0.0 20231220 (experimental) and W=1, I've noticed the following warning: kernel/watch_queue.c: In function 'watch_queue_set_size': kernel/watch_queue.c:273:32: warning: 'kcalloc' sizes specified with 'sizeof' in the earlier argument and not in the later argument [-Wcalloc-transposed-args] 273 | pages = kcalloc(sizeof(struct page *), nr_pages, GFP_KERNEL); | ^~~~~~ Since 'n' and 'size' arguments of 'kcalloc()' are multiplied to calculate the final size, their actual order doesn't affect the result and so this is not a bug. But it's still worth to fix it. Signed-off-by: Dmitry Antipov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2023-12-20posix-timers: Get rid of [COMPAT_]SYS_NI() usesLinus Torvalds2-45/+14
Only the posix timer system calls use this (when the posix timer support is disabled, which does not actually happen in any normal case), because they had debug code to print out a warning about missing system calls. Get rid of that special case, and just use the standard COND_SYSCALL interface that creates weak system call stubs that return -ENOSYS for when the system call does not exist. This fixes a kCFI issue with the SYS_NI() hackery: CFI failure at int80_emulation+0x67/0xb0 (target: sys_ni_posix_timers+0x0/0x70; expected type: 0xb02b34d9) WARNING: CPU: 0 PID: 48 at int80_emulation+0x67/0xb0 Reported-by: kernel test robot <[email protected]> Reviewed-by: Sami Tolvanen <[email protected]> Tested-by: Sami Tolvanen <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Borislav Petkov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2023-12-20wait: Remove uapi header file from main header fileMatthew Wilcox (Oracle)2-1/+4
There's really no overlap between uapi/linux/wait.h and linux/wait.h. There are two files which rely on the uapi file being implcitly included, so explicitly include it there and remove it from the main header file. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Kent Overstreet <[email protected]> Reviewed-by: Christian Brauner <[email protected]>
2023-12-20plist: Split out plist_types.hKent Overstreet3-0/+3
Trimming down sched.h dependencies: we don't want to include more than the base types. Signed-off-by: Kent Overstreet <[email protected]>
2023-12-20sched.h: move pid helpers to pid.hKent Overstreet2-2/+4
This is needed for killing the sched.h dependency on rcupdate.h, and pid.h is a better place for this code anyways. Signed-off-by: Kent Overstreet <[email protected]>
2023-12-20kernel/numa.c: Move logging out of numa.hKent Overstreet2-0/+27
Moving these stub functions to a .c file means we can kill a sched.h dependency on printk.h. Signed-off-by: Kent Overstreet <[email protected]>
2023-12-20kernel/fork.c: add missing includeKent Overstreet1-0/+1
Signed-off-by: Kent Overstreet <[email protected]>
2023-12-20crash_core: remove duplicated including of kexec.hWang Jinchao1-1/+0
Remove second include of linux/kexec.h Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Wang Jinchao <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Dave Young <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-20kexec: use ALIGN macro instead of open-coding itYuntao Wang1-2/+2
Use ALIGN macro instead of open-coding it to improve code readability. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yuntao Wang <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-20fork: remove redundant TASK_UNINTERRUPTIBLEKevin Hao1-1/+1
TASK_KILLABLE already includes TASK_UNINTERRUPTIBLE, so there is no need to add a separate TASK_UNINTERRUPTIBLE. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kevin Hao <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-20kexec_file: print out debugging message if requiredBaoquan He2-6/+13
Then when specifying '-d' for kexec_file_load interface, loaded locations of kernel/initrd/cmdline etc can be printed out to help debug. Here replace pr_debug() with the newly added kexec_dprintk() in kexec_file loading related codes. And also print out type/start/head of kimage and flags to help debug. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Cc: Conor Dooley <[email protected]> Cc: Joe Perches <[email protected]> Cc: Nathan Chancellor <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-20kexec_file: add kexec_file flag to control debug printingBaoquan He2-0/+5
Patch series "kexec_file: print out debugging message if required", v4. Currently, specifying '-d' on kexec command will print a lot of debugging informationabout kexec/kdump loading with kexec_load interface. However, kexec_file_load prints nothing even though '-d' is specified. It's very inconvenient to debug or analyze the kexec/kdump loading when something wrong happened with kexec/kdump itself or develper want to check the kexec/kdump loading. In this patchset, a kexec_file flag is KEXEC_FILE_DEBUG added and checked in code. If it's passed in, debugging message of kexec_file code will be printed out and can be seen from console and dmesg. Otherwise, the debugging message is printed like beofre when pr_debug() is taken. Note: **** ===== 1) The code in kexec-tools utility also need be changed to support passing KEXEC_FILE_DEBUG to kernel when 'kexec -s -d' is specified. The patch link is here: ========= [PATCH] kexec_file: add kexec_file flag to support debug printing http://lists.infradead.org/pipermail/kexec/2023-November/028505.html 2) s390 also has kexec_file code, while I am not sure what debugging information is necessary. So leave it to s390 developer. Test: **** ==== Testing was done in v1 on x86_64 and arm64. For v4, tested on x86_64 again. And on x86_64, the printed messages look like below: -------------------------------------------------------------- kexec measurement buffer for the loaded kernel at 0x207fffe000. Loaded purgatory at 0x207fff9000 Loaded boot_param, command line and misc at 0x207fff3000 bufsz=0x1180 memsz=0x1180 Loaded 64bit kernel at 0x207c000000 bufsz=0xc88200 memsz=0x3c4a000 Loaded initrd at 0x2079e79000 bufsz=0x2186280 memsz=0x2186280 Final command line is: root=/dev/mapper/fedora_intel--knightslanding--lb--02-root ro rd.lvm.lv=fedora_intel-knightslanding-lb-02/root console=ttyS0,115200N81 crashkernel=256M E820 memmap: 0000000000000000-000000000009a3ff (1) 000000000009a400-000000000009ffff (2) 00000000000e0000-00000000000fffff (2) 0000000000100000-000000006ff83fff (1) 000000006ff84000-000000007ac50fff (2) ...... 000000207fff6150-000000207fff615f (128) 000000207fff6160-000000207fff714f (1) 000000207fff7150-000000207fff715f (128) 000000207fff7160-000000207fff814f (1) 000000207fff8150-000000207fff815f (128) 000000207fff8160-000000207fffffff (1) nr_segments = 5 segment[0]: buf=0x000000004e5ece74 bufsz=0x211 mem=0x207fffe000 memsz=0x1000 segment[1]: buf=0x000000009e871498 bufsz=0x4000 mem=0x207fff9000 memsz=0x5000 segment[2]: buf=0x00000000d879f1fe bufsz=0x1180 mem=0x207fff3000 memsz=0x2000 segment[3]: buf=0x000000001101cd86 bufsz=0xc88200 mem=0x207c000000 memsz=0x3c4a000 segment[4]: buf=0x00000000c6e38ac7 bufsz=0x2186280 mem=0x2079e79000 memsz=0x2187000 kexec_file_load: type:0, start:0x207fff91a0 head:0x109e004002 flags:0x8 --------------------------------------------------------------------------- This patch (of 7): When specifying 'kexec -c -d', kexec_load interface will print loading information, e.g the regions where kernel/initrd/purgatory/cmdline are put, the memmap passed to 2nd kernel taken as system RAM ranges, and printing all contents of struct kexec_segment, etc. These are very helpful for analyzing or positioning what's happening when kexec/kdump itself failed. The debugging printing for kexec_load interface is made in user space utility kexec-tools. Whereas, with kexec_file_load interface, 'kexec -s -d' print nothing. Because kexec_file code is mostly implemented in kernel space, and the debugging printing functionality is missed. It's not convenient when debugging kexec/kdump loading and jumping with kexec_file_load interface. Now add KEXEC_FILE_DEBUG to kexec_file flag to control the debugging message printing. And add global variable kexec_file_dbg_print and macro kexec_dprintk() to facilitate the printing. This is a preparation, later kexec_dprintk() will be used to replace the existing pr_debug(). Once 'kexec -s -d' is specified, it will print out kexec/kdump loading information. If '-d' is not specified, it regresses to pr_debug(). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Cc: Conor Dooley <[email protected]> Cc: Joe Perches <[email protected]> Cc: Nathan Chancellor <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-20sync mm-stable with mm-hotfixes-stable to pick up depended-upon changesAndrew Morton2-6/+7
2023-12-20kexec: select CRYPTO from KEXEC_FILE instead of depending on itArnd Bergmann1-1/+2
All other users of crypto code use 'select' instead of 'depends on', so do the same thing with KEXEC_FILE for consistency. In practice this makes very little difference as kernels with kexec support are very likely to also include some other feature that already selects both crypto and crypto_sha256, but being consistent here helps for usability as well as to avoid potential circular dependencies. This reverts the dependency back to what it was originally before commit 74ca317c26a3f ("kexec: create a new config option CONFIG_KEXEC_FILE for new syscall"), which changed changed it with the comment "This should be safer as "select" is not recursive", but that appears to have been done in error, as "select" is indeed recursive, and there are no other dependencies that prevent CRYPTO_SHA256 from being selected here. Link: https://lkml.kernel.org/r/[email protected] Fixes: 74ca317c26a3f ("kexec: create a new config option CONFIG_KEXEC_FILE for new syscall") Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Eric DeVolder <[email protected]> Tested-by: Eric DeVolder <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Herbert Xu <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Conor Dooley <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-12-20kexec: fix KEXEC_FILE dependenciesArnd Bergmann1-0/+1
The cleanup for the CONFIG_KEXEC Kconfig logic accidentally changed the 'depends on CRYPTO=y' dependency to a plain 'depends on CRYPTO', which causes a link failure when all the crypto support is in a loadable module and kexec_file support is built-in: x86_64-linux-ld: vmlinux.o: in function `__x64_sys_kexec_file_load': (.text+0x32e30a): undefined reference to `crypto_alloc_shash' x86_64-linux-ld: (.text+0x32e58e): undefined reference to `crypto_shash_update' x86_64-linux-ld: (.text+0x32e6ee): undefined reference to `crypto_shash_final' Both s390 and x86 have this problem, while ppc64 and riscv have the correct dependency already. On riscv, the dependency is only used for the purgatory, not for the kexec_file code itself, which may be a bit surprising as it means that with CONFIG_CRYPTO=m, it is possible to enable KEXEC_FILE but then the purgatory code is silently left out. Move this into the common Kconfig.kexec file in a way that is correct everywhere, using the dependency on CRYPTO_SHA256=y only when the purgatory code is available. This requires reversing the dependency between ARCH_SUPPORTS_KEXEC_PURGATORY and KEXEC_FILE, but the effect remains the same, other than making riscv behave like the other ones. On s390, there is an additional dependency on CRYPTO_SHA256_S390, which should technically not be required but gives better performance. Remove this dependency here, noting that it was not present in the initial Kconfig code but was brought in without an explanation in commit 71406883fd357 ("s390/kexec_file: Add kexec_file_load system call"). [[email protected]: fix riscv build] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 6af5138083005 ("x86/kexec: refactor for kernel/Kconfig.kexec") Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Eric DeVolder <[email protected]> Tested-by: Eric DeVolder <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Conor Dooley <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David S. Miller <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Herbert Xu <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Signed-off-by: Andrew Morton <[email protected]>