blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-08-21	kernel/fork: stop playing lockless games for exe_file replacement	Mateusz Guzik	1	-13/+9
	xchg originated in 6e399cd144d8 ("prctl: avoid using mmap_sem for exe_file serialization"). While the commit message does not explain why the change, I found the original submission [1] which ultimately claims it cleans things up by removing dependency of exe_file on the semaphore. However, fe69d560b5bd ("kernel/fork: always deny write access to current MM exe_file") added a semaphore up/down cycle to synchronize the state of exe_file against fork, defeating the point of the original change. This is on top of semaphore trips already present both in the replacing function and prctl (the only consumer). Normally replacing exe_file does not happen for busy processes, thus write-locking is not an impediment to performance in the intended use case. If someone keeps invoking the routine for a busy processes they are trying to play dirty and that's another reason to avoid any trickery. As such I think the atomic here only adds complexity for no benefit. Just write-lock around the replacement. I also note that replacement races against the mapping check loop as nothing synchronizes actual assignment with with said checks but I am not addressing it in this patch. (Is the loop of any use to begin with?) Link: https://lore.kernel.org/linux-mm/[email protected]/ [1] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mateusz Guzik <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: "Christian Brauner (Microsoft)" <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mateusz Guzik <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-21	memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy	Aleksa Sarai	3	-19/+18
	This sysctl has the very unusual behaviour of not allowing any user (even CAP_SYS_ADMIN) to reduce the restriction setting, meaning that if you were to set this sysctl to a more restrictive option in the host pidns you would need to reboot your machine in order to reset it. The justification given in [1] is that this is a security feature and thus it should not be possible to disable. Aside from the fact that we have plenty of security-related sysctls that can be disabled after being enabled (fs.protected_symlinks for instance), the protection provided by the sysctl is to stop users from being able to create a binary and then execute it. A user with CAP_SYS_ADMIN can trivially do this without memfd_create(2): % cat mount-memfd.c #include <fcntl.h> #include <string.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <linux/mount.h> #define SHELLCODE "#!/bin/echo this file was executed from this totally private tmpfs:" int main(void) { int fsfd = fsopen("tmpfs", FSOPEN_CLOEXEC); assert(fsfd >= 0); assert(!fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 2)); int dfd = fsmount(fsfd, FSMOUNT_CLOEXEC, 0); assert(dfd >= 0); int execfd = openat(dfd, "exe", O_CREAT \| O_RDWR \| O_CLOEXEC, 0782); assert(execfd >= 0); assert(write(execfd, SHELLCODE, strlen(SHELLCODE)) == strlen(SHELLCODE)); assert(!close(execfd)); char execpath = NULL; char argv[] = { "bad-exe", NULL }, envp[] = { NULL }; execfd = openat(dfd, "exe", O_PATH \| O_CLOEXEC); assert(execfd >= 0); assert(asprintf(&execpath, "/proc/self/fd/%d", execfd) > 0); assert(!execve(execpath, argv, envp)); } % ./mount-memfd this file was executed from this totally private tmpfs: /proc/self/fd/5 % Given that it is possible for CAP_SYS_ADMIN users to create executable binaries without memfd_create(2) and without touching the host filesystem (not to mention the many other things a CAP_SYS_ADMIN process would be able to do that would be equivalent or worse), it seems strange to cause a fair amount of headache to admins when there doesn't appear to be an actual security benefit to blocking this. There appear to be concerns about confused-deputy-esque attacks[2] but a confused deputy that can write to arbitrary sysctls is a bigger security issue than executable memfds. / New API / The primary requirement from the original author appears to be more based on the need to be able to restrict an entire system in a hierarchical manner[3], such that child namespaces cannot re-enable executable memfds. So, implement that behaviour explicitly -- the vm.memfd_noexec scope is evaluated up the pidns tree to &init_pid_ns and you have the most restrictive value applied to you. The new lower limit you can set vm.memfd_noexec is whatever limit applies to your parent. Note that a pidns will inherit a copy of the parent pidns's effective vm.memfd_noexec setting at unshare() time. This matches the existing behaviour, and it also ensures that a pidns will never have its vm.memfd_noexec setting lowered* behind its back (but it will be raised if the parent raises theirs). /* Backwards Compatibility / As the previous version of the sysctl didn't allow you to lower the setting at all, there are no backwards compatibility issues with this aspect of the change. However it should be noted that now that the setting is completely hierarchical. Previously, a cloned pidns would just copy the current pidns setting, meaning that if the parent's vm.memfd_noexec was changed it wouldn't propoagate to existing pid namespaces. Now, the restriction applies recursively. This is a uAPI change, however: The sysctl is very new, having been merged in 6.3. * Several aspects of the sysctl were broken up until this patchset and the other patchset by Jeff Xu last month. And thus it seems incredibly unlikely that any real users would run into this issue. In the worst case, if this causes userspace isues we could make it so that modifying the setting follows the hierarchical rules but the restriction checking uses the cached copy. [1]: https://lore.kernel.org/CABi2SkWnAgHK1i6iqSqPMYuNEhtHBkO8jUuCvmG3RmUB5TKHJw@mail.gmail.com/ [2]: https://lore.kernel.org/CALmYWFs_dNCzw_pW1yRAo4bGCPEtykroEQaowNULp7svwMLjOg@mail.gmail.com/ [3]: https://lore.kernel.org/CALmYWFuahdUF7cT4cm7_TGLqPanuHXJ-hVSfZt7vpTnc18DPrw@mail.gmail.com/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") Signed-off-by: Aleksa Sarai <[email protected]> Cc: Dominique Martinet <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Daniel Verkamp <[email protected]> Cc: Jeff Xu <[email protected]> Cc: Kees Cook <[email protected]> Cc: Shuah Khan <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-21	perf/core: use vma_is_initial_stack() and vma_is_initial_heap()	Kefeng Wang	1	-22/+11
	Use the helpers to simplify code, also kill unneeded goto cpy_name. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Christian Göttsche <[email protected]> Cc: "Christian König" <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: David Airlie <[email protected]> Cc: Eric Paris <[email protected]> Cc: Felix Kuehling <[email protected]> Cc: "Pan, Xinhui" <[email protected]> Cc: Paul Moore <[email protected]> Cc: Stephen Smalley <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-21	kernel/iomem.c: remove __weak ioremap_cache helper	Arnd Bergmann	1	-8/+4
	No portable code calls into this function any more, and on architectures that don't use or define their own, it causes a warning: kernel/iomem.c:10:22: warning: no previous prototype for 'ioremap_cache' [-Wmissing-prototypes] 10 \| __weak void __iomem *ioremap_cache(resource_size_t offset, unsigned long size) Fold it into the only caller that uses it on architectures without the #define. Note that the fallback to ioremap is probably still wrong on those architectures, but this is what it's always done there. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Baoquan He <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-21	nsproxy: Convert nsproxy.count to refcount_t	Elena Reshetova	1	-2/+2
	atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nsproxy.count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in refcount.h have different memory ordering guarantees than their atomic counterparts. Please check Documentation/core-api/refcount-vs-atomic.rst for more information. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nsproxy.count it might make a difference in following places: - put_nsproxy() and switch_task_namespaces(): decrement in refcount_dec_and_test() only provides RELEASE ordering and ACQUIRE ordering on success vs. fully ordered atomic counterpart Suggested-by: Kees Cook <[email protected]> Signed-off-by: Elena Reshetova <[email protected]> Reviewed-by: David Windsor <[email protected]> Reviewed-by: Hans Liljestrand <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2023-08-21	tracing: Introduce pipe_cpumask to avoid race on trace_pipes	Zheng Yejian	2	-7/+50
	There is race issue when concurrently splice_read main trace_pipe and per_cpu trace_pipes which will result in data read out being different from what actually writen. As suggested by Steven: > I believe we should add a ref count to trace_pipe and the per_cpu > trace_pipes, where if they are opened, nothing else can read it. > > Opening trace_pipe locks all per_cpu ref counts, if any of them are > open, then the trace_pipe open will fail (and releases any ref counts > it had taken). > > Opening a per_cpu trace_pipe will up the ref count for just that > CPU buffer. This will allow multiple tasks to read different per_cpu > trace_pipe files, but will prevent the main trace_pipe file from > being opened. But because we only need to know whether per_cpu trace_pipe is open or not, using a cpumask instead of using ref count may be easier. After this patch, users will find that: - Main trace_pipe can be opened by only one user, and if it is opened, all per_cpu trace_pipes cannot be opened; - Per_cpu trace_pipes can be opened by multiple users, but each per_cpu trace_pipe can only be opened by one user. And if one of them is opened, main trace_pipe cannot be opened. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Suggested-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Zheng Yejian <[email protected]> Reviewed-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-08-20	Merge commit b320441c04c9 ("Merge tag 'tty-6.5-rc7' of ↵	Greg Kroah-Hartman	4	-21/+76
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty") into tty-next We need the serial-core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-08-18	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	1	-1/+1
	Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/sfc/tc.c fa165e194997 ("sfc: don't unregister flow_indr if it was never registered") 3bf969e88ada ("sfc: add MAE table machinery for conntrack table") https://lore.kernel.org/all/[email protected]/ No adjacent changes. Signed-off-by: Jakub Kicinski <[email protected]>
2023-08-18	watchdog/hardlockup: avoid large stack frames in watchdog_hardlockup_check()	Douglas Anderson	1	-7/+2
	After commit 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") we started storing a `struct cpumask` on the stack in watchdog_hardlockup_check(). On systems with CONFIG_NR_CPUS set to 8192 this takes up 1K on the stack. That triggers warnings with `CONFIG_FRAME_WARN` set to 1024. We'll use the new trigger_allbutcpu_cpu_backtrace() to avoid needing to use a CPU mask at all. Link: https://lkml.kernel.org/r/20230804065935.v4.2.I501ab68cb926ee33a7c87e063d207abf09b9943c@changeid Fixes: 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") Signed-off-by: Douglas Anderson <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/r/[email protected] Acked-by: Michal Hocko <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Cc: Lecopzer Chen <[email protected]> Cc: Pingfan Liu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-18	nmi_backtrace: allow excluding an arbitrary CPU	Douglas Anderson	1	-1/+1
	The APIs that allow backtracing across CPUs have always had a way to exclude the current CPU. This convenience means callers didn't need to find a place to allocate a CPU mask just to handle the common case. Let's extend the API to take a CPU ID to exclude instead of just a boolean. This isn't any more complex for the API to handle and allows the hardlockup detector to exclude a different CPU (the one it already did a trace for) without needing to find space for a CPU mask. Arguably, this new API also encourages safer behavior. Specifically if the caller wants to avoid tracing the current CPU (maybe because they already traced the current CPU) this makes it more obvious to the caller that they need to make sure that the current CPU ID can't change. [[email protected]: fix trigger_allbutcpu_cpu_backtrace() stub] Link: https://lkml.kernel.org/r/20230804065935.v4.1.Ia35521b91fc781368945161d7b28538f9996c182@changeid Signed-off-by: Douglas Anderson <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: kernel test robot <[email protected]> Cc: Lecopzer Chen <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Pingfan Liu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-18	kthread: unexport __kthread_should_park()	Greg Kroah-Hartman	1	-2/+1
	There are no in-kernel users of __kthread_should_park() so mark it as static and do not export it. Link: https://lkml.kernel.org/r/2023080450-handcuff-stump-1d6e@gregkh Signed-off-by: Greg Kroah-Hartman <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: John Stultz <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: "Arve Hjønnevåg" <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Andrew Morton <[email protected]> Cc: "Christian Brauner (Microsoft)" <[email protected]> Cc: Mike Christie <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Zqiang <[email protected]> Cc: Prathu Baronia <[email protected]> Cc: Sami Tolvanen <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-18	gcov: shut up missing prototype warnings for internal stubs	Arnd Bergmann	1	-0/+2
	gcov uses global functions that are called from generated code, but these have no prototype in a header, which causes a W=1 build warning: kernel/gcov/gcc_base.c:12:6: error: no previous prototype for '__gcov_init' [-Werror=missing-prototypes] kernel/gcov/gcc_base.c:40:6: error: no previous prototype for '__gcov_flush' [-Werror=missing-prototypes] kernel/gcov/gcc_base.c:46:6: error: no previous prototype for '__gcov_merge_add' [-Werror=missing-prototypes] kernel/gcov/gcc_base.c:52:6: error: no previous prototype for '__gcov_merge_single' [-Werror=missing-prototypes] Just turn off these warnings unconditionally for the two files that contain them. Link: https://lore.kernel.org/all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Tested-by: Peter Oberparleiter <[email protected]> Acked-by: Peter Oberparleiter <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-18	kernel: relay: remove unnecessary NULL values from relay_open_buf	Li kunyu	1	-1/+1
	buf is assigned first, so it does not need to initialize the assignment. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Li kunyu <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-18	remove ARCH_DEFAULT_KEXEC from Kconfig.kexec	Eric DeVolder	1	-1/+0
	This patch is a minor cleanup to the series "refactor Kconfig to consolidate KEXEC and CRASH options". In that series, a new option ARCH_DEFAULT_KEXEC was introduced in order to obtain the equivalent behavior of s390 original Kconfig settings for KEXEC. As it turns out, this new option did not fully provide the equivalent behavior, rather a "select KEXEC" did. As such, the ARCH_DEFAULT_KEXEC is not needed anymore, so remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric DeVolder <[email protected]> Acked-by: Alexander Gordeev <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-18	kexec: rename ARCH_HAS_KEXEC_PURGATORY	Eric DeVolder	1	-3/+3
	The Kconfig refactor to consolidate KEXEC and CRASH options utilized option names of the form ARCH_SUPPORTS_<option>. Thus rename the ARCH_HAS_KEXEC_PURGATORY to ARCH_SUPPORTS_KEXEC_PURGATORY to follow the same. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric DeVolder <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-18	kexec: consolidate kexec and crash options into kernel/Kconfig.kexec	Eric DeVolder	1	-0/+116
	Patch series "refactor Kconfig to consolidate KEXEC and CRASH options", v6. The Kconfig is refactored to consolidate KEXEC and CRASH options from various arch/<arch>/Kconfig files into new file kernel/Kconfig.kexec. The Kconfig.kexec is now a submenu titled "Kexec and crash features" located under "General Setup". The following options are impacted: - KEXEC - KEXEC_FILE - KEXEC_SIG - KEXEC_SIG_FORCE - KEXEC_IMAGE_VERIFY_SIG - KEXEC_BZIMAGE_VERIFY_SIG - KEXEC_JUMP - CRASH_DUMP Over time, these options have been copied between Kconfig files and are very similar to one another, but with slight differences. The following architectures are impacted by the refactor (because of use of one or more KEXEC/CRASH options): - arm - arm64 - ia64 - loongarch - m68k - mips - parisc - powerpc - riscv - s390 - sh - x86 More information: In the patch series "crash: Kernel handling of CPU and memory hot un/plug" https://lore.kernel.org/lkml/[email protected]/ the new kernel feature introduces the config option CRASH_HOTPLUG. In reviewing, Thomas Gleixner requested that the new config option not be placed in x86 Kconfig. Rather the option needs a generic/common home. To Thomas' point, the KEXEC and CRASH options have largely been duplicated in the various arch/<arch>/Kconfig files, with minor differences. This kind of proliferation is to be avoid/stopped. https://lore.kernel.org/lkml/875y91yv63.ffs@tglx/ To that end, I have refactored the arch Kconfigs so as to consolidate the various KEXEC and CRASH options. Generally speaking, this work has the following themes: - KEXEC and CRASH options are moved into new file kernel/Kconfig.kexec - These items from arch/Kconfig: CRASH_CORE KEXEC_CORE KEXEC_ELF HAVE_IMA_KEXEC - These items from arch/x86/Kconfig form the common options: KEXEC KEXEC_FILE KEXEC_SIG KEXEC_SIG_FORCE KEXEC_BZIMAGE_VERIFY_SIG KEXEC_JUMP CRASH_DUMP - These items from arch/arm64/Kconfig form the common options: KEXEC_IMAGE_VERIFY_SIG - The crash hotplug series appends CRASH_HOTPLUG to Kconfig.kexec - The Kconfig.kexec is now a submenu titled "Kexec and crash features" and is now listed in "General Setup" submenu from init/Kconfig. - To control the common options, each has a new ARCH_SUPPORTS_<option> option. These gateway options determine whether the common options options are valid for the architecture. - To account for the slight differences in the original architecture coding of the common options, each now has a corresponding ARCH_SELECTS_<option> which are used to elicit the same side effects as the original arch/<arch>/Kconfig files for KEXEC and CRASH options. An example, 'make menuconfig' illustrating the submenu: > General setup > Kexec and crash features [] Enable kexec system call [] Enable kexec file based system call [] Verify kernel signature during kexec_file_load() syscall [ ] Require a valid signature in kexec_file_load() syscall [ ] Enable bzImage signature verification support [] kexec jump [] kernel crash dumps [] Update the crash elfcorehdr on system configuration changes In the process of consolidating the common options, I encountered slight differences in the coding of these options in several of the architectures. As a result, I settled on the following solution: - Each of the common options has a 'depends on ARCH_SUPPORTS_<option>' statement. For example, the KEXEC_FILE option has a 'depends on ARCH_SUPPORTS_KEXEC_FILE' statement. This approach is needed on all common options so as to prevent options from appearing for architectures which previously did not allow/enable them. For example, arm supports KEXEC but not KEXEC_FILE. The arch/arm/Kconfig does not provide ARCH_SUPPORTS_KEXEC_FILE and so KEXEC_FILE and related options are not available to arm. - The boolean ARCH_SUPPORTS_<option> in effect allows the arch to determine when the feature is allowed. Archs which don't have the feature simply do not provide the corresponding ARCH_SUPPORTS_<option>. For each arch, where there previously were KEXEC and/or CRASH options, these have been replaced with the corresponding boolean ARCH_SUPPORTS_<option>, and an appropriate def_bool statement. For example, if the arch supports KEXEC_FILE, then the ARCH_SUPPORTS_KEXEC_FILE simply has a 'def_bool y'. This permits the KEXEC_FILE option to be available. If the arch has a 'depends on' statement in its original coding of the option, then that expression becomes part of the def_bool expression. For example, arm64 had: config KEXEC depends on PM_SLEEP_SMP and in this solution, this converts to: config ARCH_SUPPORTS_KEXEC def_bool PM_SLEEP_SMP - In order to account for the architecture differences in the coding for the common options, the ARCH_SELECTS_<option> in the arch/<arch>/Kconfig is used. This option has a 'depends on <option>' statement to couple it to the main option, and from there can insert the differences from the common option and the arch original coding of that option. For example, a few archs enable CRYPTO and CRYTPO_SHA256 for KEXEC_FILE. These require a ARCH_SELECTS_KEXEC_FILE and 'select CRYPTO' and 'select CRYPTO_SHA256' statements. Illustrating the option relationships: For each of the common KEXEC and CRASH options: ARCH_SUPPORTS_<option> <- <option> <- ARCH_SELECTS_<option> <option> # in Kconfig.kexec ARCH_SUPPORTS_<option> # in arch/<arch>/Kconfig, as needed ARCH_SELECTS_<option> # in arch/<arch>/Kconfig, as needed For example, KEXEC: ARCH_SUPPORTS_KEXEC <- KEXEC <- ARCH_SELECTS_KEXEC KEXEC # in Kconfig.kexec ARCH_SUPPORTS_KEXEC # in arch/<arch>/Kconfig, as needed ARCH_SELECTS_KEXEC # in arch/<arch>/Kconfig, as needed To summarize, the ARCH_SUPPORTS_<option> permits the <option> to be enabled, and the ARCH_SELECTS_<option> handles side effects (ie. select statements). Examples: A few examples to show the new strategy in action: ===== x86 (minus the help section) ===== Original: config KEXEC bool "kexec system call" select KEXEC_CORE config KEXEC_FILE bool "kexec file based system call" select KEXEC_CORE select HAVE_IMA_KEXEC if IMA depends on X86_64 depends on CRYPTO=y depends on CRYPTO_SHA256=y config ARCH_HAS_KEXEC_PURGATORY def_bool KEXEC_FILE config KEXEC_SIG bool "Verify kernel signature during kexec_file_load() syscall" depends on KEXEC_FILE config KEXEC_SIG_FORCE bool "Require a valid signature in kexec_file_load() syscall" depends on KEXEC_SIG config KEXEC_BZIMAGE_VERIFY_SIG bool "Enable bzImage signature verification support" depends on KEXEC_SIG depends on SIGNED_PE_FILE_VERIFICATION select SYSTEM_TRUSTED_KEYRING config CRASH_DUMP bool "kernel crash dumps" depends on X86_64 \|\| (X86_32 && HIGHMEM) config KEXEC_JUMP bool "kexec jump" depends on KEXEC && HIBERNATION help becomes... New: config ARCH_SUPPORTS_KEXEC def_bool y config ARCH_SUPPORTS_KEXEC_FILE def_bool X86_64 && CRYPTO && CRYPTO_SHA256 config ARCH_SELECTS_KEXEC_FILE def_bool y depends on KEXEC_FILE select HAVE_IMA_KEXEC if IMA config ARCH_SUPPORTS_KEXEC_PURGATORY def_bool KEXEC_FILE config ARCH_SUPPORTS_KEXEC_SIG def_bool y config ARCH_SUPPORTS_KEXEC_SIG_FORCE def_bool y config ARCH_SUPPORTS_KEXEC_BZIMAGE_VERIFY_SIG def_bool y config ARCH_SUPPORTS_KEXEC_JUMP def_bool y config ARCH_SUPPORTS_CRASH_DUMP def_bool X86_64 \|\| (X86_32 && HIGHMEM) ===== powerpc (minus the help section) ===== Original: config KEXEC bool "kexec system call" depends on PPC_BOOK3S \|\| PPC_E500 \|\| (44x && !SMP) select KEXEC_CORE config KEXEC_FILE bool "kexec file based system call" select KEXEC_CORE select HAVE_IMA_KEXEC if IMA select KEXEC_ELF depends on PPC64 depends on CRYPTO=y depends on CRYPTO_SHA256=y config ARCH_HAS_KEXEC_PURGATORY def_bool KEXEC_FILE config CRASH_DUMP bool "Build a dump capture kernel" depends on PPC64 \|\| PPC_BOOK3S_32 \|\| PPC_85xx \|\| (44x && !SMP) select RELOCATABLE if PPC64 \|\| 44x \|\| PPC_85xx becomes... New: config ARCH_SUPPORTS_KEXEC def_bool PPC_BOOK3S \|\| PPC_E500 \|\| (44x && !SMP) config ARCH_SUPPORTS_KEXEC_FILE def_bool PPC64 && CRYPTO=y && CRYPTO_SHA256=y config ARCH_SUPPORTS_KEXEC_PURGATORY def_bool KEXEC_FILE config ARCH_SELECTS_KEXEC_FILE def_bool y depends on KEXEC_FILE select KEXEC_ELF select HAVE_IMA_KEXEC if IMA config ARCH_SUPPORTS_CRASH_DUMP def_bool PPC64 \|\| PPC_BOOK3S_32 \|\| PPC_85xx \|\| (44x && !SMP) config ARCH_SELECTS_CRASH_DUMP def_bool y depends on CRASH_DUMP select RELOCATABLE if PPC64 \|\| 44x \|\| PPC_85xx Testing Approach and Results There are 388 config files in the arch/<arch>/configs directories. For each of these config files, a .config is generated both before and after this Kconfig series, and checked for equivalence. This approach allows for a rather rapid check of all architectures and a wide variety of configs wrt/ KEXEC and CRASH, and avoids requiring compiling for all architectures and running kernels and run-time testing. For each config file, the olddefconfig, allnoconfig and allyesconfig targets are utilized. In testing the randconfig has revealed problems as well, but is not used in the before and after equivalence check since one can not generate the "same" .config for before and after, even if using the same KCONFIG_SEED since the option list is different. As such, the following script steps compare the before and after of 'make olddefconfig'. The new symbols introduced by this series are filtered out, but otherwise the config files are PASS only if they were equivalent, and FAIL otherwise. The script performs the test by doing the following: # Obtain the "golden" .config output for given config file # Reset test sandbox git checkout master git branch -D test_Kconfig git checkout -B test_Kconfig master make distclean # Write out updated config cp -f <config file> .config make ARCH=<arch> olddefconfig # Track each item in .config, LHSB is "golden" scoreboard .config # Obtain the "changed" .config output for given config file # Reset test sandbox make distclean # Apply this Kconfig series git am <this Kconfig series> # Write out updated config cp -f <config file> .config make ARCH=<arch> olddefconfig # Track each item in .config, RHSB is "changed" scoreboard .config # Determine test result # Filter-out new symbols introduced by this series # Filter-out symbol=n which not in either scoreboard # Compare LHSB "golden" and RHSB "changed" scoreboards and issue PASS/FAIL The script was instrumental during the refactoring of Kconfig as it continually revealed problems. The end result being that the solution presented in this series passes all configs as checked by the script, with the following exceptions: - arch/ia64/configs/zx1_config with olddefconfig This config file has: # CONFIG_KEXEC is not set CONFIG_CRASH_DUMP=y and this refactor now couples KEXEC to CRASH_DUMP, so it is not possible to enable CRASH_DUMP without KEXEC. - arch/sh/configs/* with allyesconfig The arch/sh/Kconfig codes CRASH_DUMP as dependent upon BROKEN_ON_MMU (which clearly is not meant to be set). This symbol is not provided but with the allyesconfig it is set to yes which enables CRASH_DUMP. But KEXEC is coded as dependent upon MMU, and is set to no in arch/sh/mm/Kconfig, so KEXEC is not enabled. This refactor now couples KEXEC to CRASH_DUMP, so it is not possible to enable CRASH_DUMP without KEXEC. While the above exceptions are not equivalent to their original, the config file produced is valid (and in fact better wrt/ CRASH_DUMP handling). This patch (of 14) The config options for kexec and crash features are consolidated into new file kernel/Kconfig.kexec. Under the "General Setup" submenu is a new submenu "Kexec and crash handling". All the kexec and crash options that were once in the arch-dependent submenu "Processor type and features" are now consolidated in the new submenu. The following options are impacted: - KEXEC - KEXEC_FILE - KEXEC_SIG - KEXEC_SIG_FORCE - KEXEC_BZIMAGE_VERIFY_SIG - KEXEC_JUMP - CRASH_DUMP The three main options are KEXEC, KEXEC_FILE and CRASH_DUMP. Architectures specify support of certain KEXEC and CRASH features with similarly named new ARCH_SUPPORTS_<option> config options. Architectures can utilize the new ARCH_SELECTS_<option> config options to specify additional components when <option> is enabled. To summarize, the ARCH_SUPPORTS_<option> permits the <option> to be enabled, and the ARCH_SELECTS_<option> handles side effects (ie. select statements). Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric DeVolder <[email protected]> Cc: Albert Ou <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Baoquan He <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Cc. "H. Peter Anvin" <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Dave Hansen <[email protected]> # for x86 Cc: Frederic Weisbecker <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: John Paul Adrian Glaubitz <[email protected]> Cc: Juerg Haefliger <[email protected]> Cc: Kees Cook <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Linus Walleij <[email protected]> Cc: Marc Aurèle La France <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Russell King (Oracle) <[email protected]> Cc: Sami Tolvanen <[email protected]> Cc: Sebastian Reichel <[email protected]> Cc: Sourabh Jain <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: WANG Xuerui <[email protected]> Cc: Will Deacon <[email protected]> Cc: Xin Li <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Zhen Lei <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-18	acct: replace all non-returning strlcpy with strscpy	Azeem Shaikh	1	-1/+1
	strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Azeem Shaikh <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-18	signal: print comm and exe name on fatal signals	Vincent Whitchurch	1	-1/+12
	Make the print-fatal-signals message more useful by printing the comm and the exe name for the process which received the fatal signal: Before: potentially unexpected fatal signal 4 potentially unexpected fatal signal 11 After: buggy-program: pool: potentially unexpected fatal signal 4 some-daemon: gdbus: potentially unexpected fatal signal 11 comm used to be present but was removed in commit 681a90ffe829b8ee25d ("arc, print-fatal-signals: reduce duplicated information") because it's also included as part of the later stack trace. Having the comm as part of the main "unexpected fatal..." print is rather useful though when analysing logs, and the exe name is also valuable as shown in the examples above where the comm ends up having some generic name like "pool". [[email protected]: don't include linux/file.h twice] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Vincent Whitchurch <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vineet Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-18	cred: convert printks to pr_<level>	tiozhang	1	-12/+15
	Use current logging style. Link: https://lkml.kernel.org/r/20230625033452.GA22858@didi-ThinkCentre-M930t-N000 Signed-off-by: tiozhang <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Joe Perches <[email protected]> Cc: Kees Cook <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Paulo Alcantara <[email protected]> Cc: Weiping Zhang <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-18	mmu_notifiers: don't invalidate secondary TLBs as part of ↵	Alistair Popple	1	-1/+1
	mmu_notifier_invalidate_range_end() Secondary TLBs are now invalidated from the architecture specific TLB invalidation functions. Therefore there is no need to explicitly notify or invalidate as part of the range end functions. This means we can remove mmu_notifier_invalidate_range_end_only() and some of the ptep_*_notify() functions. Link: https://lkml.kernel.org/r/90d749d03cbab256ca0edeb5287069599566d783.1690292440.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Cc: Andrew Donnellan <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chaitanya Kumar Borah <[email protected]> Cc: Frederic Barrat <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: Kevin Tian <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Nicolin Chen <[email protected]> Cc: Robin Murphy <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Will Deacon <[email protected]> Cc: Zhi Wang <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-18	mm: move is_ioremap_addr() into new header file	Baoquan He	1	-0/+1
	Now is_ioremap_addr() is only used in kernel/iomem.c and gonna be used in mm/ioremap.c. Move it into its own new header file linux/ioremap.h. Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Christoph Hellwig <[email protected]> Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Chris Zankel <[email protected]> Cc: David Laight <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Gerald Schaefer <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: John Paul Adrian Glaubitz <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Niklas Schnelle <[email protected]> Cc: Rich Felker <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Stefan Kristiansson <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-18	mm/mm_init.c: remove obsolete macro HASH_SMALL	Miaohe Lin	1	-2/+1
	HASH_SMALL only works when parameter numentries is 0. But the sole caller futex_init() never calls alloc_large_system_hash() with numentries set to 0. So HASH_SMALL is obsolete and remove it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Mike Rapoport (IBM) <[email protected]> Cc: André Almeida <[email protected]> Cc: Darren Hart <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-18	mm: remove arguments of show_mem()	Kefeng Wang	1	-1/+1
	All callers of show_mem() pass 0 and NULL, so we can remove the two arguments by directly calling __show_mem(0, NULL, MAX_NR_ZONES - 1) in show_mem(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-08-17	cgroup: Avoid -Wstringop-overflow warnings	Gustavo A. R. Silva	1	-2/+2
	Change the notation from pointer-to-array to pointer-to-pointer. With this, we avoid the compiler complaining about trying to access a region of size zero as an argument during function calls. This is a workaround to prevent the compiler complaining about accessing an array of size zero when evaluating the arguments of a couple of function calls. See below: kernel/cgroup/cgroup.c: In function 'find_css_set': kernel/cgroup/cgroup.c:1206:16: warning: 'find_existing_css_set' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] 1206 \| cset = find_existing_css_set(old_cset, cgrp, template); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/cgroup/cgroup.c:1206:16: note: referencing argument 3 of type 'struct cgroup_subsys_state [0]' kernel/cgroup/cgroup.c:1071:24: note: in a call to function 'find_existing_css_set' 1071 \| static struct css_set find_existing_css_set(struct css_set *old_cset, \| ^~~~~~~~~~~~~~~~~~~~~ With the change to pointer-to-pointer, the functions are not prevented from being executed, and they will do what they have to do when CGROUP_SUBSYS_COUNT == 0. Address the following -Wstringop-overflow warnings seen when built with ARM architecture and aspeed_g4_defconfig configuration (notice that under this configuration CGROUP_SUBSYS_COUNT == 0): kernel/cgroup/cgroup.c:1208:16: warning: 'find_existing_css_set' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] kernel/cgroup/cgroup.c:1258:15: warning: 'css_set_hash' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] kernel/cgroup/cgroup.c:6089:18: warning: 'css_set_hash' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] kernel/cgroup/cgroup.c:6153:18: warning: 'css_set_hash' accessing 4 bytes in a region of size 0 [-Wstringop-overflow=] This results in no differences in binary output. Link: https://github.com/KSPP/linux/issues/316 Signed-off-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Kees Cook <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2023-08-17	seccomp: Add missing kerndoc notations	Kees Cook	1	-4/+10
	The kerndoc for some struct member and function arguments were missing. Add them. Cc: Andy Lutomirski <[email protected]> Cc: Will Drewry <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Kees Cook <[email protected]>
2023-08-17	tracing: Fix memleak due to race between current_tracer and trace	Zheng Yejian	3	-2/+12
	Kmemleak report a leak in graph_trace_open(): unreferenced object 0xffff0040b95f4a00 (size 128): comm "cat", pid 204981, jiffies 4301155872 (age 99771.964s) hex dump (first 32 bytes): e0 05 e7 b4 ab 7d 00 00 0b 00 01 00 00 00 00 00 .....}.......... f4 00 01 10 00 a0 ff ff 00 00 00 00 65 00 10 00 ............e... backtrace: [<000000005db27c8b>] kmem_cache_alloc_trace+0x348/0x5f0 [<000000007df90faa>] graph_trace_open+0xb0/0x344 [<00000000737524cd>] __tracing_open+0x450/0xb10 [<0000000098043327>] tracing_open+0x1a0/0x2a0 [<00000000291c3876>] do_dentry_open+0x3c0/0xdc0 [<000000004015bcd6>] vfs_open+0x98/0xd0 [<000000002b5f60c9>] do_open+0x520/0x8d0 [<00000000376c7820>] path_openat+0x1c0/0x3e0 [<00000000336a54b5>] do_filp_open+0x14c/0x324 [<000000002802df13>] do_sys_openat2+0x2c4/0x530 [<0000000094eea458>] __arm64_sys_openat+0x130/0x1c4 [<00000000a71d7881>] el0_svc_common.constprop.0+0xfc/0x394 [<00000000313647bf>] do_el0_svc+0xac/0xec [<000000002ef1c651>] el0_svc+0x20/0x30 [<000000002fd4692a>] el0_sync_handler+0xb0/0xb4 [<000000000c309c35>] el0_sync+0x160/0x180 The root cause is descripted as follows: __tracing_open() { // 1. File 'trace' is being opened; ... iter->trace = tr->current_trace; // 2. Tracer 'function_graph' is // currently set; ... iter->trace->open(iter); // 3. Call graph_trace_open() here, // and memory are allocated in it; ... } s_start() { // 4. The opened file is being read; ... iter->trace = tr->current_trace; // 5. If tracer is switched to // 'nop' or others, then memory // in step 3 are leaked!!! ... } To fix it, in s_start(), close tracer before switching then reopen the new tracer after switching. And some tracers like 'wakeup' may not update 'iter->private' in some cases when reopen, then it should be cleared to avoid being mistakenly closed again. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Fixes: d7350c3f4569 ("tracing/core: make the read callbacks reentrants") Signed-off-by: Zheng Yejian <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-08-17	sched/eevdf: Curb wakeup-preemption	Peter Zijlstra	2	-0/+13
	Mike and others noticed that EEVDF does like to over-schedule quite a bit -- which does hurt performance of a number of benchmarks / workloads. In particular, what seems to cause over-scheduling is that when lag is of the same order (or larger) than the request / slice then placement will not only cause the task to be placed left of current, but also with a smaller deadline than current, which causes immediate preemption. [ notably, lag bounds are relative to HZ ] Mike suggested we stick to picking 'current' for as long as it's eligible to run, giving it uninterrupted runtime until it reaches parity with the pack. Augment Mike's suggestion by only allowing it to exhaust it's initial request. One random data point: echo NO_RUN_TO_PARITY > /debug/sched/features perf stat -a -e context-switches --repeat 10 -- perf bench sched messaging -g 20 -t -l 5000 3,723,554 context-switches ( +- 0.56% ) 9.5136 +- 0.0394 seconds time elapsed ( +- 0.41% ) echo RUN_TO_PARITY > /debug/sched/features perf stat -a -e context-switches --repeat 10 -- perf bench sched messaging -g 20 -t -l 5000 2,556,535 context-switches ( +- 0.51% ) 9.2427 +- 0.0302 seconds time elapsed ( +- 0.33% ) Suggested-by: Mike Galbraith <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-08-16	Merge branches 'doc.2023.07.14b', 'fixes.2023.08.16a', ↵	Paul E. McKenney	9	-57/+285
	'rcu-tasks.2023.07.24a', 'rcuscale.2023.07.14b', 'refscale.2023.07.14b', 'torture.2023.08.14a' and 'torturescripts.2023.07.20a' into HEAD doc.2023.07.14b: Documentation updates. fixes.2023.08.16a: Miscellaneous fixes. rcu-tasks.2023.07.24a: RCU Tasks updates. rcuscale.2023.07.14b: RCU (updater) scalability test updates. refscale.2023.07.14b: Reference (reader) scalability test updates. torture.2023.08.14a: Other torture-test updates. torturescripts.2023.07.20a: Other torture-test scripting updates.
2023-08-16	rcu: Make the rcu_nocb_poll boot parameter usable via boot config	Paul E. McKenney	1	-2/+2
	The rcu_nocb_poll kernel boot parameter is defined via early_param(), whose parsing functions are invoked from parse_early_param() which is in turn invoked by setup_arch(), which is very early indeed. It is invoked so early that the console output timestamps read 0.000000, in other words, before time begins. This use of early_param() means that the rcu_nocb_poll kernel boot parameter cannot usefully be embedded into the kernel image. Yes, you can embed it, but setup_boot_config() is invoked from start_kernel() too late for it to be parsed. But it makes no sense to parse this parameter so early. After all, it cannot do anything until the rcuog kthreads are created, which is long after rcu_init() time, let alone setup_boot_config() time. This commit therefore switches the rcu_nocb_poll kernel boot parameter from early_param() to __setup(), which allows boot-config parsing of this parameter, in turn allowing it to be embedded into the kernel image. Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Joel Fernandes (Google) <[email protected]>
2023-08-16	rcu: Mark __rcu_irq_enter_check_tick() ->rcu_urgent_qs load	Paul E. McKenney	1	-1/+1
	The rcu_request_urgent_qs_task() function does a cross-CPU store to ->rcu_urgent_qs, so this commit therefore marks the load in __rcu_irq_enter_check_tick() with READ_ONCE(). Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Joel Fernandes (Google) <[email protected]>
2023-08-16	tracing/synthetic: Allocate one additional element for size	Sven Schnelle	1	-1/+2
	While debugging another issue I noticed that the stack trace contains one invalid entry at the end: <idle>-0 [008] d..4. 26.484201: wake_lat: pid=0 delta=2629976084 000000009cc24024 stack=STACK: => __schedule+0xac6/0x1a98 => schedule+0x126/0x2c0 => schedule_timeout+0x150/0x2c0 => kcompactd+0x9ca/0xc20 => kthread+0x2f6/0x3d8 => __ret_from_fork+0x8a/0xe8 => 0x6b6b6b6b6b6b6b6b This is because the code failed to add the one element containing the number of entries to field_size. Link: https://lkml.kernel.org/r/[email protected] Cc: Masami Hiramatsu <[email protected]> Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces") Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-08-16	tracing/synthetic: Skip first entry for stack traces	Sven Schnelle	1	-13/+4
	While debugging another issue I noticed that the stack trace output contains the number of entries on top: <idle>-0 [000] d..4. 203.322502: wake_lat: pid=0 delta=2268270616 stack=STACK: => 0x10 => __schedule+0xac6/0x1a98 => schedule+0x126/0x2c0 => schedule_timeout+0x242/0x2c0 => __wait_for_common+0x434/0x680 => __wait_rcu_gp+0x198/0x3e0 => synchronize_rcu+0x112/0x138 => ring_buffer_reset_online_cpus+0x140/0x2e0 => tracing_reset_online_cpus+0x15c/0x1d0 => tracing_set_clock+0x180/0x1d8 => hist_register_trigger+0x486/0x670 => event_hist_trigger_parse+0x494/0x1318 => trigger_process_regex+0x1d4/0x258 => event_trigger_write+0xb4/0x170 => vfs_write+0x210/0xad0 => ksys_write+0x122/0x208 Fix this by skipping the first element. Also replace the pointer logic with an index variable which is easier to read. Link: https://lkml.kernel.org/r/[email protected] Cc: Masami Hiramatsu <[email protected]> Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces") Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-08-16	tracing/synthetic: Use union instead of casts	Sven Schnelle	2	-50/+45
	The current code uses a lot of casts to access the fields member in struct synth_trace_events with different sizes. This makes the code hard to read, and had already introduced an endianness bug. Use a union and struct instead. Link: https://lkml.kernel.org/r/[email protected] Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Fixes: 00cf3d672a9dd ("tracing: Allow synthetic events to pass around stacktraces") Signed-off-by: Sven Schnelle <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-08-16	tracing: Fix cpu buffers unavailable due to 'record_disabled' missed	Zheng Yejian	1	-0/+6
	Trace ring buffer can no longer record anything after executing following commands at the shell prompt: # cd /sys/kernel/tracing # cat tracing_cpumask fff # echo 0 > tracing_cpumask # echo 1 > snapshot # echo fff > tracing_cpumask # echo 1 > tracing_on # echo "hello world" > trace_marker -bash: echo: write error: Bad file descriptor The root cause is that: 1. After `echo 0 > tracing_cpumask`, 'record_disabled' of cpu buffers in 'tr->array_buffer.buffer' became 1 (see tracing_set_cpumask()); 2. After `echo 1 > snapshot`, 'tr->array_buffer.buffer' is swapped with 'tr->max_buffer.buffer', then the 'record_disabled' became 0 (see update_max_tr()); 3. After `echo fff > tracing_cpumask`, the 'record_disabled' become -1; Then array_buffer and max_buffer are both unavailable due to value of 'record_disabled' is not 0. To fix it, enable or disable both array_buffer and max_buffer at the same time in tracing_set_cpumask(). Link: https://lkml.kernel.org/r/[email protected] Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Fixes: 71babb2705e2 ("tracing: change CPU ring buffer state from tracing_cpumask") Signed-off-by: Zheng Yejian <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-08-16	printk: export symbols for debug modules	Enlin Mu	1	-0/+2
	the module is out-of-tree, it saves kernel logs when panic Signed-off-by: Enlin Mu <[email protected]> Acked-by: Petr Mladek <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Signed-off-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-08-16	bpf: Fix uninitialized symbol in bpf_perf_link_fill_kprobe()	Yafang Shao	1	-3/+2
	The commit 1b715e1b0ec5 ("bpf: Support ->fill_link_info for perf_event") leads to the following Smatch static checker warning: kernel/bpf/syscall.c:3416 bpf_perf_link_fill_kprobe() error: uninitialized symbol 'type'. That can happens when uname is NULL. So fix it by verifying the uname when we really need to fill it. Fixes: 1b715e1b0ec5 ("bpf: Support ->fill_link_info for perf_event") Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Yafang Shao <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Yonghong Song <[email protected]> Acked-by: Jiri Olsa <[email protected]> Closes: https://lore.kernel.org/bpf/[email protected] Link: https://lore.kernel.org/bpf/[email protected]
2023-08-16	perf/hw_breakpoint: Remove arch breakpoint hooks	Benjamin Gray	1	-28/+0
	PowerPC was the only user of these hooks, and has been refactored to no longer require them. There is no need to keep them around, so remove them to reduce complexity. Signed-off-by: Benjamin Gray <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2023-08-15	sysctl: Add size to register_sysctl	Joel Granados	1	-1/+1
	This commit adds table_size to register_sysctl in preparation for the removal of the sentinel elements in the ctl_table arrays (last empty markers). And though we do not remove any sentinels in this commit, we set things up by either passing the table_size explicitly or using ARRAY_SIZE on the ctl_table arrays. We replace the register_syctl function with a macro that will add the ARRAY_SIZE to the new register_sysctl_sz function. In this way the callers that are already using an array of ctl_table structs do not change. For the callers that pass a ctl_table array pointer, we pass the table_size to register_sysctl_sz instead of the macro. Signed-off-by: Joel Granados <[email protected]> Suggested-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2023-08-15	sysctl: Add a size arg to __register_sysctl_table	Joel Granados	1	-1/+2
	We make these changes in order to prepare __register_sysctl_table and its callers for when we remove the sentinel element (empty element at the end of ctl_table arrays). We don't actually remove any sentinels in this commit, but we do make sure to use ARRAY_SIZE so the table_size is available when the removal occurs. We add a table_size argument to __register_sysctl_table and adjust callers, all of which pass ctl_table pointers and need an explicit call to ARRAY_SIZE. We implement a size calculation in register_net_sysctl in order to forward the size of the array pointer received from the network register calls. The new table_size argument does not yet have any effect in the init_header call which is still dependent on the sentinel's presence. table_size does however drive the `kzalloc` allocation in __register_sysctl_table with no adverse effects as the allocated memory is either one element greater than the calculated ctl_table array (for the calls in ipc_sysctl.c, mq_sysctl.c and ucount.c) or the exact size of the calculated ctl_table array (for the call from sysctl_net.c and register_sysctl). This approach will allows us to "just" remove the sentinel without further changes to __register_sysctl_table as table_size will represent the exact size for all the callers at that point. Signed-off-by: Joel Granados <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2023-08-15	audit: move trailing statements to next line	Atul Kumar Pant	2	-2/+4
	Fixes following checkpatch.pl issue: ERROR: trailing statements should be on next line Signed-off-by: Atul Kumar Pant <[email protected]> [PM: subject line tweak] Signed-off-by: Paul Moore <[email protected]>
2023-08-15	audit: cleanup function braces and assignment-in-if-condition	Atul Kumar Pant	1	-2/+4
	The patch fixes following checkpatch.pl issue: ERROR: open brace '{' following function definitions go on the next line ERROR: do not use assignment in if condition Signed-off-by: Atul Kumar Pant <[email protected]> [PM: subject line tweaks] Signed-off-by: Paul Moore <[email protected]>
2023-08-15	audit: add space before parenthesis and around '=', "==", and '<'	Atul Kumar Pant	3	-10/+10
	Fixes following checkpatch.pl issue: ERROR: space required before the open parenthesis '(' ERROR: spaces required around that '=' ERROR: spaces required around that '<' ERROR: spaces required around that '==' Signed-off-by: Atul Kumar Pant <[email protected]> [PM: subject line tweaks] Signed-off-by: Paul Moore <[email protected]>
2023-08-14	bpf: Support default .validate() and .update() behavior for struct_ops links	David Vernet	1	-6/+9
	Currently, if a struct_ops map is loaded with BPF_F_LINK, it must also define the .validate() and .update() callbacks in its corresponding struct bpf_struct_ops in the kernel. Enabling struct_ops link is useful in its own right to ensure that the map is unloaded if an application crashes. For example, with sched_ext, we want to automatically unload the host-wide scheduler if the application crashes. We would likely never support updating elements of a sched_ext struct_ops map, so we'd have to implement these callbacks showing that they _can't_ support element updates just to benefit from the basic lifetime management of struct_ops links. Let's enable struct_ops maps to work with BPF_F_LINK even if they haven't defined these callbacks, by assuming that a struct_ops map element cannot be updated by default. Acked-by: Kui-Feng Lee <[email protected]> Signed-off-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-08-14	cgroup:namespace: Remove unused cgroup_namespaces_init()	Lu Jialin	1	-6/+0
	cgroup_namspace_init() just return 0. Therefore, there is no need to call it during start_kernel. Just remove it. Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces") Signed-off-by: Lu Jialin <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2023-08-14	workqueue: Rename rescuer kworker	Aaron Tomlin	1	-1/+1
	Each CPU-specific and unbound kworker kthread conforms to a particular naming scheme. However, this does not extend to the rescuer kworker. At present, a rescuer kworker is simply named according to its workqueue's name. This can be cryptic. This patch modifies a rescuer to follow the kworker naming scheme. The "R" is indicative of a rescuer and after "-" is its workqueue's name e.g. "kworker/R-ext4-rsv-conver". tj: Use "R" instead of "r" as the prefix to make it more distinctive and consistent with how highpri pools are marked. Signed-off-by: Aaron Tomlin <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2023-08-14	rcutorture: Stop right-shifting torture_random() return values	Paul E. McKenney	1	-3/+3
	Now that torture_random() uses swahw32(), its callers no longer see not-so-random low-order bits, as these are now swapped up into the upper 16 bits of the torture_random() function's return value. This commit therefore removes the right-shifting of torture_random() return values. Signed-off-by: Paul E. McKenney <[email protected]>
2023-08-14	torture: Stop right-shifting torture_random() return values	Paul E. McKenney	1	-2/+2
	Now that torture_random() uses swahw32(), its callers no longer see not-so-random low-order bits, as these are now swapped up into the upper 16 bits of the torture_random() function's return value. This commit therefore removes the right-shifting of torture_random() return values. Signed-off-by: Paul E. McKenney <[email protected]>
2023-08-14	torture: Move stutter_wait() timeouts to hrtimers	Paul E. McKenney	1	-2/+2
	In order to gain better race coverage, move the test start/stop waits in stutter_wait() to torture_hrtimeout_jiffies(). Signed-off-by: Paul E. McKenney <[email protected]>
2023-08-14	torture: Move torture_shuffle() timeouts to hrtimers	Paul E. McKenney	1	-1/+3
	In order to gain better race coverage, move the CPU-migration timed waits in torture_shuffle() to torture_hrtimeout_jiffies(). Signed-off-by: Paul E. McKenney <[email protected]>
2023-08-14	torture: Move torture_onoff() timeouts to hrtimers	Paul E. McKenney	1	-3/+3
	In order to gain better race coverage, move the CPU-hotplug-related timed waits in torture_onoff() to torture_hrtimeout_jiffies(). Signed-off-by: Paul E. McKenney <[email protected]>