aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2022-05-20perf: Fix sys_perf_event_open() race against selfPeter Zijlstra1-0/+14
Norbert reported that it's possible to race sys_perf_event_open() such that the looser ends up in another context from the group leader, triggering many WARNs. The move_group case checks for races against itself, but the !move_group case doesn't, seemingly relying on the previous group_leader->ctx == ctx check. However, that check is racy due to not holding any locks at that time. Therefore, re-check the result after acquiring locks and bailing if they no longer match. Additionally, clarify the not_move_group case from the move_group-vs-move_group race. Fixes: f63a8daa5812 ("perf: Fix event->ctx locking") Reported-by: Norbert Slusarek <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-05-20Merge branches 'for-next/sme', 'for-next/stacktrace', ↵Catalin Marinas3-2/+31
'for-next/fault-in-subpage', 'for-next/misc', 'for-next/ftrace' and 'for-next/crashkernel', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf/arm-cmn: Decode CAL devices properly in debugfs perf/arm-cmn: Fix filter_sel lookup perf/marvell_cn10k: Fix tad_pmu_event_init() to check pmu type first drivers/perf: hisi: Add Support for CPA PMU drivers/perf: hisi: Associate PMUs in SICL with CPUs online drivers/perf: arm_spe: Expose saturating counter to 16-bit perf/arm-cmn: Add CMN-700 support perf/arm-cmn: Refactor occupancy filter selector perf/arm-cmn: Add CMN-650 support dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700 perf: check return value of armpmu_request_irq() perf: RISC-V: Remove non-kernel-doc ** comments * for-next/sme: (30 commits) : Scalable Matrix Extensions support. arm64/sve: Move sve_free() into SVE code section arm64/sve: Make kernel FPU protection RT friendly arm64/sve: Delay freeing memory in fpsimd_flush_thread() arm64/sme: More sensibly define the size for the ZA register set arm64/sme: Fix NULL check after kzalloc arm64/sme: Add ID_AA64SMFR0_EL1 to __read_sysreg_by_encoding() arm64/sme: Provide Kconfig for SME KVM: arm64: Handle SME host state when running guests KVM: arm64: Trap SME usage in guest KVM: arm64: Hide SME system registers from guests arm64/sme: Save and restore streaming mode over EFI runtime calls arm64/sme: Disable streaming mode and ZA when flushing CPU state arm64/sme: Add ptrace support for ZA arm64/sme: Implement ptrace support for streaming mode SVE registers arm64/sme: Implement ZA signal handling arm64/sme: Implement streaming SVE signal handling arm64/sme: Disable ZA and streaming mode when handling signals arm64/sme: Implement traps and syscall handling for SME arm64/sme: Implement ZA context switching arm64/sme: Implement streaming SVE context switching ... * for-next/stacktrace: : Stacktrace cleanups. arm64: stacktrace: align with common naming arm64: stacktrace: rename stackframe to unwind_state arm64: stacktrace: rename unwinder functions arm64: stacktrace: make struct stackframe private to stacktrace.c arm64: stacktrace: delete PCS comment arm64: stacktrace: remove NULL task check from unwind_frame() * for-next/fault-in-subpage: : btrfs search_ioctl() live-lock fix using fault_in_subpage_writeable(). btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults arm64: Add support for user sub-page fault probing mm: Add fault_in_subpage_writeable() to probe at sub-page granularity * for-next/misc: : Miscellaneous patches. arm64: Kconfig.platforms: Add comments arm64: Kconfig: Fix indentation and add comments arm64: mm: avoid writable executable mappings in kexec/hibernate code arm64: lds: move special code sections out of kernel exec segment arm64/hugetlb: Implement arm64 specific huge_ptep_get() arm64/hugetlb: Use ptep_get() to get the pte value of a huge page arm64: mm: Make arch_faults_on_old_pte() check for migratability arm64: mte: Clean up user tag accessors arm64/hugetlb: Drop TLB flush from get_clear_flush() arm64: Declare non global symbols as static arm64: mm: Cleanup useless parameters in zone_sizes_init() arm64: fix types in copy_highpage() arm64: Set ARCH_NR_GPIO to 2048 for ARCH_APPLE arm64: cputype: Avoid overflow using MIDR_IMPLEMENTOR_MASK arm64: document the boot requirements for MTE arm64/mm: Compute PTRS_PER_[PMD|PUD] independently of PTRS_PER_PTE * for-next/ftrace: : ftrace cleanups. arm64/ftrace: Make function graph use ftrace directly ftrace: cleanup ftrace_graph_caller enable and disable * for-next/crashkernel: : Support for crashkernel reservations above ZONE_DMA. arm64: kdump: Do not allocate crash low memory if not needed docs: kdump: Update the crashkernel description for arm64 of: Support more than one crash kernel regions for kexec -s of: fdt: Add memory for devices by DT property "linux,usable-memory-range" arm64: kdump: Reimplement crashkernel=X arm64: Use insert_resource() to simplify code kdump: return -ENOENT if required cmdline option does not exist
2022-05-20Merge tag 'irqchip-5.19' of ↵Thomas Gleixner18-745/+651
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: - Add new infrastructure to stop gpiolib from rewriting irq_chip structures behind our back. Convert a few of them, but this will obviously be a long effort. - A bunch of GICv3 improvements, such as using MMIO-based invalidations when possible, and reducing the amount of polling we perform when reconfiguring interrupts. - Another set of GICv3 improvements for the Pseudo-NMI functionality, with a nice cleanup making it easy to reason about the various states we can be in when an NMI fires. - The usual bunch of misc fixes and minor improvements. Link: https://lore.kernel.org/all/[email protected]
2022-05-19cgroup: remove the superfluous judgmentShida Zhang1-1/+1
Remove the superfluous judgment since the function is never called for a root cgroup, as suggested by Tejun. Suggested-by: Tejun Heo <[email protected]> Signed-off-by: Shida Zhang <[email protected]> Reviewed-by: Michal Koutný <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2022-05-19blob_to_mnt(): kern_unmount() is needed to undo kern_mount()Al Viro1-2/+2
plain mntput() won't do. Signed-off-by: Al Viro <[email protected]>
2022-05-19sched: Reverse sched_class layoutPeter Zijlstra2-13/+14
Because GCC-12 is fully stupid about array bounds and it's just really hard to get a solid array definition from a linker script, flip the array order to avoid needing negative offsets :-/ This makes the whole relational pointer magic a little less obvious, but alas. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2022-05-19sched/clock: Use try_cmpxchg64 in sched_clock_{local,remote}Uros Bizjak1-2/+2
Use try_cmpxchg64 instead of cmpxchg64 (*ptr, old, new) != old in sched_clock_{local,remote}. x86 cmpxchg returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2022-05-19mm: khugepaged: make khugepaged_enter() void functionYang Shi1-3/+1
The most callers of khugepaged_enter() don't care about the return value. Only dup_mmap(), anonymous THP page fault and MADV_HUGEPAGE handle the error by returning -ENOMEM. Actually it is not harmful for them to ignore the error case either. It also sounds overkilling to fail fork() and page fault early due to khugepaged_enter() error, and MADV_HUGEPAGE does set VM_HUGEPAGE flag regardless of the error. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Shi <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: Vlastmil Babka <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Song Liu <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-05-19kexec_file: Fix kexec_file.c build error for riscv platformLiao Chang1-2/+2
When CONFIG_KEXEC_FILE is set for riscv platform, the compilation of kernel/kexec_file.c generate build error: kernel/kexec_file.c: In function 'crash_prepare_elf64_headers': ./arch/riscv/include/asm/page.h:110:71: error: request for member 'virt_addr' in something not a structure or union 110 | ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < kernel_map.virt_addr)) | ^ ./arch/riscv/include/asm/page.h:131:2: note: in expansion of macro 'is_linear_mapping' 131 | is_linear_mapping(_x) ? \ | ^~~~~~~~~~~~~~~~~ ./arch/riscv/include/asm/page.h:140:31: note: in expansion of macro '__va_to_pa_nodebug' 140 | #define __phys_addr_symbol(x) __va_to_pa_nodebug(x) | ^~~~~~~~~~~~~~~~~~ ./arch/riscv/include/asm/page.h:143:24: note: in expansion of macro '__phys_addr_symbol' 143 | #define __pa_symbol(x) __phys_addr_symbol(RELOC_HIDE((unsigned long)(x), 0)) | ^~~~~~~~~~~~~~~~~~ kernel/kexec_file.c:1327:36: note: in expansion of macro '__pa_symbol' 1327 | phdr->p_offset = phdr->p_paddr = __pa_symbol(_text); This occurs is because the "kernel_map" referenced in macro is_linear_mapping() is suppose to be the one of struct kernel_mapping defined in arch/riscv/mm/init.c, but the 2nd argument of crash_prepare_elf64_header() has same symbol name, in expansion of macro is_linear_mapping in function crash_prepare_elf64_header(), "kernel_map" actually is the local variable. Signed-off-by: Liao Chang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2022-05-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski9-16/+21
drivers/net/ethernet/mellanox/mlx5/core/main.c b33886971dbc ("net/mlx5: Initialize flow steering during driver probe") 40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support") f2b41b32cde8 ("net/mlx5: Remove ipsec_ops function table") https://lore.kernel.org/all/20220519040345.6yrjromcdistu7vh@sx1/ 16d42d313350 ("net/mlx5: Drain fw_reset when removing device") 8324a02c342a ("net/mlx5: Add exit route when waiting for FW") https://lore.kernel.org/all/[email protected]/ tools/testing/selftests/net/mptcp/mptcp_join.sh e274f7154008 ("selftests: mptcp: add subflow limits test-cases") b6e074e171bc ("selftests: mptcp: add infinite map testcase") 5ac1d2d63451 ("selftests: mptcp: Add tests for userspace PM type") https://lore.kernel.org/all/[email protected]/ net/mptcp/options.c ba2c89e0ea74 ("mptcp: fix checksum byte order") 1e39e5a32ad7 ("mptcp: infinite mapping sending") ea66758c1795 ("tcp: allow MPTCP to update the announced window") https://lore.kernel.org/all/[email protected]/ net/mptcp/pm.c 95d686517884 ("mptcp: fix subflow accounting on close") 4d25247d3ae4 ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs") https://lore.kernel.org/all/[email protected]/ net/mptcp/subflow.c ae66fb2ba6c3 ("mptcp: Do TCP fallback on early DSS checksum failure") 0348c690ed37 ("mptcp: add the fallback check") f8d4bcacff3b ("mptcp: infinite mapping receiving") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-19kernel/reboot: Add devm_register_restart_handler()Dmitry Osipenko1-0/+22
Add devm_register_restart_handler() helper that registers sys-off handler using restart mode and with a default priority. Most drivers will want to register restart handler with a default priority, so this helper will reduce the boilerplate code and make code easier to read and follow. Signed-off-by: Dmitry Osipenko <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2022-05-19kernel/reboot: Add devm_register_power_off_handler()Dmitry Osipenko1-0/+22
Add devm_register_power_off_handler() helper that registers sys-off handler using power-off mode and with a default priority. Most drivers will want to register power-off handler with a default priority, so this helper will reduce the boilerplate code and make code easier to read and follow. Signed-off-by: Dmitry Osipenko <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2022-05-19reboot: Remove pm_power_off_prepare()Dmitry Osipenko1-19/+0
All pm_power_off_prepare() users were converted to sys-off handler API. Remove the obsolete global callback variable. Signed-off-by: Dmitry Osipenko <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2022-05-19kernel/reboot: Add register_platform_power_off()Dmitry Osipenko1-0/+55
Add platform-level registration helpers that will ease transition of the arch/platform power-off callbacks to the new sys-off based API, allowing us to remove the global pm_power_off variable in the future. Signed-off-by: Dmitry Osipenko <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2022-05-19kernel/reboot: Add kernel_can_power_off()Dmitry Osipenko1-1/+13
Add kernel_can_power_off() helper that replaces open-coded checks of the global pm_power_off variable. This is a necessary step towards supporting chained power-off handlers. Signed-off-by: Dmitry Osipenko <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2022-05-19kernel/reboot: Add stub for pm_power_offDmitry Osipenko1-0/+6
Add weak stub for the global pm_power_off callback variable. This will allow us to remove pm_power_off definitions from arch/ code and transition to the new sys-off based API that will replace the global variable. Signed-off-by: Dmitry Osipenko <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2022-05-19kernel/reboot: Add do_kernel_power_off()Dmitry Osipenko1-0/+13
Add do_kernel_power_off() helper that will remove open-coded pm_power_off invocations from the architecture code. This is the first step on the way to remove the global pm_power_off variable, which will allow us to implement consistent power-off chaining support. Signed-off-by: Dmitry Osipenko <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2022-05-19kernel/reboot: Wrap legacy power-off callbacks into sys-off handlersDmitry Osipenko1-2/+42
Wrap legacy power-off callbacks into sys-off handlers in order to support co-existence of both legacy and new callbacks while we're in process of upgrading legacy callbacks to the new API. Signed-off-by: Dmitry Osipenko <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2022-05-19kernel/reboot: Introduce sys-off handler APIDmitry Osipenko1-0/+182
In order to support power-off chaining we need to get rid of the global pm_* variables, replacing them with the new kernel API functions that support chaining. Introduce new generic sys-off handler API that brings the following features: 1. Power-off and restart handlers are registered using same API function that supports chaining, hence all power-off and restart modes will support chaining using this unified function. 2. Prevents notifier priority collisions by disallowing registration of multiple handlers at the non-default priority level. 3. Supports passing opaque user argument to callback, which allows us to remove global variables from drivers. This patch adds support of the following sys-off modes: - SYS_OFF_MODE_POWER_OFF_PREPARE that replaces global pm_power_off_prepare variable and provides chaining support for power-off-prepare handlers. - SYS_OFF_MODE_POWER_OFF that replaces global pm_power_off variable and provides chaining support for power-off handlers. - SYS_OFF_MODE_RESTART that provides a better restart API, removing a need from drivers to have a global scratch variable by utilizing the opaque callback argument. Signed-off-by: Dmitry Osipenko <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2022-05-19notifier: Add blocking/atomic_notifier_chain_register_unique_prio()Dmitry Osipenko1-19/+69
Add variant of blocking/atomic_notifier_chain_register() functions that allow registration of a notifier only if it has unique priority, otherwise -EBUSY error code is returned by the new functions. Reviewed-by: Michał Mirosław <[email protected]> Signed-off-by: Dmitry Osipenko <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2022-05-19notifier: Add atomic_notifier_call_chain_is_empty()Dmitry Osipenko1-0/+13
Add atomic_notifier_call_chain_is_empty() that returns true if given atomic call chain is empty. The first user of this new notifier API function will be the kernel power-off core code that will support power-off call chains. The core code will need to check whether there is a power-off handler registered at all in order to decide whether to halt machine or power it off. Reviewed-by: Michał Mirosław <[email protected]> Signed-off-by: Dmitry Osipenko <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2022-05-18cgroup: Make cgroup_debug staticXiu Jianfeng2-2/+1
Make cgroup_debug static since it's only used in cgroup.c Signed-off-by: Xiu Jianfeng <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2022-05-18random32: use real rng for non-deterministic randomnessJason A. Donenfeld1-2/+0
random32.c has two random number generators in it: one that is meant to be used deterministically, with some predefined seed, and one that does the same exact thing as random.c, except does it poorly. The first one has some use cases. The second one no longer does and can be replaced with calls to random.c's proper random number generator. The relatively recent siphash-based bad random32.c code was added in response to concerns that the prior random32.c was too deterministic. Out of fears that random.c was (at the time) too slow, this code was anonymously contributed. Then out of that emerged a kind of shadow entropy gathering system, with its own tentacles throughout various net code, added willy nilly. Stop👏making👏bespoke👏random👏number👏generators👏. Fortunately, recent advances in random.c mean that we can stop playing with this sketchiness, and just use get_random_u32(), which is now fast enough. In micro benchmarks using RDPMC, I'm seeing the same median cycle count between the two functions, with the mean being _slightly_ higher due to batches refilling (which we can optimize further need be). However, when doing *real* benchmarks of the net functions that actually use these random numbers, the mean cycles actually *decreased* slightly (with the median still staying the same), likely because the additional prandom code means icache misses and complexity, whereas random.c is generally already being used by something else nearby. The biggest benefit of this is that there are many users of prandom who probably should be using cryptographically secure random numbers. This makes all of those accidental cases become secure by just flipping a switch. Later on, we can do a tree-wide cleanup to remove the static inline wrapper functions that this commit adds. There are also some low-ish hanging fruits for making this even faster in the future: a get_random_u16() function for use in the networking stack will give a 2x performance boost there, using SIMD for ChaCha20 will let us compute 4 or 8 or 16 blocks of output in parallel, instead of just one, giving us large buffers for cheap, and introducing a get_random_*_bh() function that assumes irqs are already disabled will shave off a few cycles for ordinary calls. These are things we can chip away at down the road. Acked-by: Jakub Kicinski <[email protected]> Acked-by: Theodore Ts'o <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2022-05-17audit,io_uring,io-wq: call __audit_uring_exit for dummy contextsJulian Orth1-0/+6
Not calling the function for dummy contexts will cause the context to not be reset. During the next syscall, this will cause an error in __audit_syscall_entry: WARN_ON(context->context != AUDIT_CTX_UNUSED); WARN_ON(context->name_count); if (context->context != AUDIT_CTX_UNUSED || context->name_count) { audit_panic("unrecoverable error in audit_syscall_entry()"); return; } These problematic dummy contexts are created via the following call chain: exit_to_user_mode_prepare -> arch_do_signal_or_restart -> get_signal -> task_work_run -> tctx_task_work -> io_req_task_submit -> io_issue_sqe -> audit_uring_entry Cc: [email protected] Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring") Signed-off-by: Julian Orth <[email protected]> [PM: subject line tweaks] Signed-off-by: Paul Moore <[email protected]>
2022-05-17swiotlb: max mapping size takes min align mask into accountTianyu Lan1-1/+12
swiotlb_find_slots() skips slots according to io tlb aligned mask calculated from min aligned mask and original physical address offset. This affects max mapping size. The mapping size can't achieve the IO_TLB_SEGSIZE * IO_TLB_SIZE when original offset is non-zero. This will cause system boot up failure in Hyper-V Isolation VM where swiotlb force is enabled. Scsi layer use return value of dma_max_mapping_size() to set max segment size and it finally calls swiotlb_max_mapping_size(). Hyper-V storage driver sets min align mask to 4k - 1. Scsi layer may pass 256k length of request buffer with 0~4k offset and Hyper-V storage driver can't get swiotlb bounce buffer via DMA API. Swiotlb_find_slots() can't find 256k length bounce buffer with offset. Make swiotlb_max_mapping _size() take min align mask into account. Signed-off-by: Tianyu Lan <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-05-16kcsan: test: use new suite_{init,exit} supportMarco Elver1-18/+13
Use the newly added suite_{init,exit} support for suite-wide init and cleanup. This avoids the unsupported method by which the test used to do suite-wide init and cleanup (avoiding issues such as missing TAP headers, and possible future conflicts). Signed-off-by: Marco Elver <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2022-05-15Merge tag 'sched-urgent-2022-05-15' of ↵Linus Torvalds7-15/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Thomas Gleixner: "The recent expansion of the sched switch tracepoint inserted a new argument in the middle of the arguments. This reordering broke BPF programs which relied on the old argument list. While tracepoints are not considered stable ABI, it's not trivial to make BPF cope with such a change, but it's being worked on. For now restore the original argument order and move the new argument to the end of the argument list" * tag 'sched-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/tracing: Append prev_state to tp args instead
2022-05-15Merge tag 'irq-urgent-2022-05-15' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A single fix for a recent (introduced in 5.16) regression in the core interrupt code. The consolidation of the interrupt handler invocation code added an unconditional warning when generic_handle_domain_irq() is invoked from outside hard interrupt context. That's overbroad as the requirement for invoking these handlers in hard interrupt context is only required for certain interrupt types. The subsequently called code already contains a warning which triggers conditionally for interrupt chips which indicate this requirement in their properties. Remove the overbroad one" * tag 'irq-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Remove WARN_ON_ONCE() in generic_handle_domain_irq()
2022-05-14genirq/irq_sim: Make the irq_work always run in hard irq contextSebastian Andrzej Siewior1-1/+1
The IRQ simulator uses irq_work to trigger an interrupt. Without the IRQ_WORK_HARD_IRQ flag the irq_work will be performed in thread context on PREEMPT_RT. This causes locking errors later in handle_simple_irq() which expects to be invoked with disabled interrupts. Triggering individual interrupts in hardirq context should not lead to unexpected high latencies since this is also what the hardware controller does. Also it is used as a simulator so... Use IRQ_WORK_INIT_HARD() to carry out the irq_work in hardirq context on PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-05-14timers: Provide a better debugobjects hint for delayed worksStephen Boyd1-1/+31
With debugobjects enabled the timer hint for freeing of active timers embedded inside delayed works is always the same, i.e. the hint is delayed_work_timer_fn, even though the function the delayed work is going to run can be wildly different depending on what work was queued. Enabling workqueue debugobjects doesn't help either because the delayed work isn't considered active until it is actually queued to run on a workqueue. If the work is freed while the timer is pending the work isn't considered active so there is no information from workqueue debugobjects. Special case delayed works in the timer debugobjects hint logic so that the delayed work function is returned instead of the delayed_work_timer_fn. This will help to understand which delayed work was pending that got freed. Apply the same treatment for kthread_delayed_work because it follows the same pattern. Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-05-13bpf: Add MEM_UNINIT as a bpf_type_flagJoanne Koong2-22/+10
Instead of having uninitialized versions of arguments as separate bpf_arg_types (eg ARG_PTR_TO_UNINIT_MEM as the uninitialized version of ARG_PTR_TO_MEM), we can instead use MEM_UNINIT as a bpf_type_flag modifier to denote that the argument is uninitialized. Doing so cleans up some of the logic in the verifier. We no longer need to do two checks against an argument type (eg "if (base_type(arg_type) == ARG_PTR_TO_MEM || base_type(arg_type) == ARG_PTR_TO_UNINIT_MEM)"), since uninitialized and initialized versions of the same argument type will now share the same base type. In the near future, MEM_UNINIT will be used by dynptr helper functions as well. Signed-off-by: Joanne Koong <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-05-13timekeeping: Add raw clock fallback for random_get_entropy()Jason A. Donenfeld1-0/+15
The addition of random_get_entropy_fallback() provides access to whichever time source has the highest frequency, which is useful for gathering entropy on platforms without available cycle counters. It's not necessarily as good as being able to quickly access a cycle counter that the CPU has, but it's still something, even when it falls back to being jiffies-based. In the event that a given arch does not define get_cycles(), falling back to the get_cycles() default implementation that returns 0 is really not the best we can do. Instead, at least calling random_get_entropy_fallback() would be preferable, because that always needs to return _something_, even falling back to jiffies eventually. It's not as though random_get_entropy_fallback() is super high precision or guaranteed to be entropic, but basically anything that's not zero all the time is better than returning zero all the time. Finally, since random_get_entropy_fallback() is used during extremely early boot when randomizing freelists in mm_init(), it can be called before timekeeping has been initialized. In that case there really is nothing we can do; jiffies hasn't even started ticking yet. So just give up and return 0. Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Theodore Ts'o <[email protected]>
2022-05-13printk: stop including cache.h from printk.hPeter Collingbourne1-0/+1
An inclusion of cache.h in printk.h was added in 2014 in commit c28aa1f0a847 ("printk/cache: mark printk_once test variable __read_mostly") in order to bring in the definition of __read_mostly. The usage of __read_mostly was later removed in commit 3ec25826ae33 ("printk: Tie printk_once / printk_deferred_once into .data.once for reset") which made the inclusion of cache.h unnecessary, so remove it. We have a small amount of code that depended on the inclusion of cache.h from printk.h; fix that code to include the appropriate header. This fixes a circular inclusion on arm64 (linux/printk.h -> linux/cache.h -> asm/cache.h -> linux/kasan-enabled.h -> linux/static_key.h -> linux/jump_label.h -> linux/bug.h -> asm/bug.h -> linux/printk.h) that would otherwise be introduced by the next patch. Build tested using {allyesconfig,defconfig} x {arm64,x86_64}. Link: https://linux-review.googlesource.com/id/I8fd51f72c9ef1f2d6afd3b2cbc875aa4792c1fba Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Peter Collingbourne <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kees Cook <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-05-13bpf: Fix combination of jit blinding and pointers to bpf subprogs.Alexei Starovoitov1-0/+10
The combination of jit blinding and pointers to bpf subprogs causes: [ 36.989548] BUG: unable to handle page fault for address: 0000000100000001 [ 36.990342] #PF: supervisor instruction fetch in kernel mode [ 36.990968] #PF: error_code(0x0010) - not-present page [ 36.994859] RIP: 0010:0x100000001 [ 36.995209] Code: Unable to access opcode bytes at RIP 0xffffffd7. [ 37.004091] Call Trace: [ 37.004351] <TASK> [ 37.004576] ? bpf_loop+0x4d/0x70 [ 37.004932] ? bpf_prog_3899083f75e4c5de_F+0xe3/0x13b The jit blinding logic didn't recognize that ld_imm64 with an address of bpf subprogram is a special instruction and proceeded to randomize it. By itself it wouldn't have been an issue, but jit_subprogs() logic relies on two step process to JIT all subprogs and then JIT them again when addresses of all subprogs are known. Blinding process in the first JIT phase caused second JIT to miss adjustment of special ld_imm64. Fix this issue by ignoring special ld_imm64 instructions that don't have user controlled constants and shouldn't be blinded. Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Reported-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-05-13swiotlb: use the right nslabs-derived sizes in swiotlb_init_lateChristoph Hellwig1-8/+11
nslabs can shrink when allocations or the remap don't succeed, so make sure to use it for all sizing. For that remove the bytes value that can get stale and replace it with local calculations and a boolean to indicate if the originally requested size could not be allocated. Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl") Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]>
2022-05-13swiotlb: use the right nslabs value in swiotlb_init_remapChristoph Hellwig1-3/+4
default_nslabs should only be used to initialize nslabs, after that we need to use the local variable that can shrink when allocations or the remap don't succeed. Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl") Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]>
2022-05-13swiotlb: don't panic when the swiotlb buffer can't be allocatedChristoph Hellwig1-2/+4
For historical reasons the switlb code paniced when the metadata could not be allocated, but just printed a warning when the actual main swiotlb buffer could not be allocated. Restore this somewhat unexpected behavior as changing it caused a boot failure on the Microchip RISC-V PolarFire SoC Icicle kit. Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl") Reported-by: Conor Dooley <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Acked-by: Conor Dooley <[email protected]> Tested-by: Conor Dooley <[email protected]>
2022-05-13futex: Remove a PREEMPT_RT_FULL reference.Sebastian Andrzej Siewior1-1/+1
Earlier the PREEMPT_RT patch had a PREEMPT_RT_FULL and PREEMPT_RT_BASE Kconfig option. The latter was a subset of the functionality that was enabled with PREEMPT_RT_FULL and was mainly useful for debugging. During the merging efforts the two Kconfig options were abandoned in the v5.4.3-rt1 release and since then there is only PREEMPT_RT which enables the full features set (as PREEMPT_RT_FULL did in earlier releases). Replace the PREEMPT_RT_FULL reference with PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: André Almeida <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-05-12relay: remove redundant assignment to pointer bufColin Ian King1-1/+1
Pointer buf is being assigned a value that is not being read, buf is being re-assigned in the next starement. The assignment is redundant and can be removed. Cleans up clang scan build warning: kernel/relay.c:443:8: warning: Although the value stored to 'buf' is used in the enclosing expression, the value is never actually read from 'buf' [deadcode.DeadStores] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Kalle Valo <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-05-12kernel/crash_core.c: remove redundant check of ck_cmdlinelizhe1-3/+0
At the end of get_last_crashkernel(), the judgement of ck_cmdline is obviously unnecessary and causes redundance, let's clean it up. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: lizhe <[email protected]> Acked-by: Baoquan He <[email protected]> Acked-by: Philipp Rudo <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Dave Young <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-05-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski6-15/+41
No conflicts. Build issue in drivers/net/ethernet/sfc/ptp.c 54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static") 49e6123c65da ("net: sfc: fix memory leak due to ptp channel") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-12Merge branch 'for-5.18-fixes' of ↵Linus Torvalds1-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "Waiman's fix for a cgroup2 cpuset bug where it could miss nodes which were hot-added" * 'for-5.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()
2022-05-12module: merge check_exported_symbol() into find_exported_symbol_in_section()Masahiro Yamada1-15/+7
Now check_exported_symbol() always succeeds. Merge it into find_exported_symbol_in_search() to make the code concise. Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2022-05-12module: do not binary-search in __ksymtab_gpl if fsa->gplok is falseMasahiro Yamada1-2/+3
Currently, !fsa->gplok && syms->license == GPL_ONLY) is checked after bsearch() succeeds. It is meaningless to do the binary search in the GPL symbol table when fsa->gplok is false because we know find_exported_symbol_in_section() will fail anyway. This check should be done before bsearch(). Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2022-05-12module: do not pass opaque pointer for symbol searchMasahiro Yamada1-7/+4
There is no need to use an opaque pointer for check_exported_symbol() or find_exported_symbol_in_section. Pass (struct find_symbol_arg *) explicitly. Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2022-05-12module: show disallowed symbol name for inherit_taint()Lecopzer Chen1-6/+6
The error log for inherit_taint() doesn't really help to find the symbol which violates GPL rules. For example, if a module has 300 symbol and includes 50 disallowed symbols, the log only shows the content below and we have no idea what symbol is. AAA: module using GPL-only symbols uses symbols from proprietary module BBB. It's hard for user who doesn't really know how the symbol was parsing. This patch add symbol name to tell the offending symbols explicitly. AAA: module using GPL-only symbols uses symbols SSS from proprietary module BBB. Signed-off-by: Lecopzer Chen <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2022-05-12module: fix [e_shstrndx].sh_size=0 OOB accessAlexey Dobriyan1-0/+4
It is trivial to craft a module to trigger OOB access in this line: if (info->secstrings[strhdr->sh_size - 1] != '\0') { BUG: unable to handle page fault for address: ffffc90000aa0fff PGD 100000067 P4D 100000067 PUD 100066067 PMD 10436f067 PTE 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 7 PID: 1215 Comm: insmod Not tainted 5.18.0-rc5-00007-g9bf578647087-dirty #10 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 RIP: 0010:load_module+0x19b/0x2391 Fixes: ec2a29593c83 ("module: harden ELF info handling") Signed-off-by: Alexey Dobriyan <[email protected]> [rebased patch onto modules-next] Signed-off-by: Luis Chamberlain <[email protected]>
2022-05-12module: Introduce module unload taint trackingAaron Tomlin4-0/+88
Currently, only the initial module that tainted the kernel is recorded e.g. when an out-of-tree module is loaded. The purpose of this patch is to allow the kernel to maintain a record of each unloaded module that taints the kernel. So, in addition to displaying a list of linked modules (see print_modules()) e.g. in the event of a detected bad page, unloaded modules that carried a taint/or taints are displayed too. A tainted module unload count is maintained. The number of tracked modules is not fixed. This feature is disabled by default. Signed-off-by: Aaron Tomlin <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2022-05-12module: Move module_assert_mutex_or_preempt() to internal.hAaron Tomlin2-11/+12
No functional change. This patch migrates module_assert_mutex_or_preempt() to internal.h. So, the aforementiond function can be used outside of main/or core module code yet will remain restricted for internal use only. Signed-off-by: Aaron Tomlin <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2022-05-12module: Make module_flags_taint() accept a module's taints bitmap and usable ↵Aaron Tomlin2-4/+5
outside core code No functional change. The purpose of this patch is to modify module_flags_taint() to accept a module's taints bitmap as a parameter and modifies all users accordingly. Furthermore, it is now possible to access a given module's taint flags data outside of non-essential code yet does remain for internal use only. This is in preparation for module unload taint tracking support. Signed-off-by: Aaron Tomlin <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>