aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2021-04-27Merge tag 'seccomp-v5.13-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: - Fix "cacheable" typo in comments (Cui GaoSheng) - Fix CONFIG for /proc/$pid/status Seccomp_filters ([email protected]) * tag 'seccomp-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: seccomp: Fix "cacheable" typo in comments seccomp: Fix CONFIG tests for Seccomp_filters
2021-04-27bpf, cpumap: Bulk skb using netif_receive_skb_listLorenzo Bianconi1-9/+9
Rely on netif_receive_skb_list routine to send skbs converted from xdp_frames in cpu_map_kthread_run in order to improve i-cache usage. The proposed patch has been tested running xdp_redirect_cpu bpf sample available in the kernel tree that is used to redirect UDP frames from ixgbe driver to a cpumap entry and then to the networking stack. UDP frames are generated using pktgen. Packets are discarded by the UDP layer. $ xdp_redirect_cpu --cpu <cpu> --progname xdp_cpu_map0 --dev <eth> bpf-next: ~2.35Mpps bpf-next + cpumap skb-list: ~2.72Mpps Rename drops counter in kmem_alloc_drops since now it reports just kmem_cache_alloc_bulk failures Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Link: https://lore.kernel.org/bpf/c729f83e5d7482d9329e0f165bdbe5adcefd1510.1619169700.git.lorenzo@kernel.org
2021-04-27bpf: Fix propagation of 32 bit unsigned bounds from 64 bit boundsDaniel Borkmann1-5/+3
Similarly as b02709587ea3 ("bpf: Fix propagation of 32-bit signed bounds from 64-bit bounds."), we also need to fix the propagation of 32 bit unsigned bounds from 64 bit counterparts. That is, really only set the u32_{min,max}_value when /both/ {umin,umax}_value safely fit in 32 bit space. For example, the register with a umin_value == 1 does /not/ imply that u32_min_value is also equal to 1, since umax_value could be much larger than 32 bit subregister can hold, and thus u32_min_value is in the interval [0,1] instead. Before fix, invalid tracking result of R2_w=inv1: [...] 5: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0) R10=fp0 5: (35) if r2 >= 0x1 goto pc+1 [...] // goto path 7: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umin_value=1) R10=fp0 7: (b6) if w2 <= 0x1 goto pc+1 [...] // goto path 9: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,smin_value=-9223372036854775807,smax_value=9223372032559808513,umin_value=1,umax_value=18446744069414584321,var_off=(0x1; 0xffffffff00000000),s32_min_value=1,s32_max_value=1,u32_max_value=1) R10=fp0 9: (bc) w2 = w2 10: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv1 R10=fp0 [...] After fix, correct tracking result of R2_w=inv(id=0,umax_value=1,var_off=(0x0; 0x1)): [...] 5: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0) R10=fp0 5: (35) if r2 >= 0x1 goto pc+1 [...] // goto path 7: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umin_value=1) R10=fp0 7: (b6) if w2 <= 0x1 goto pc+1 [...] // goto path 9: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,smax_value=9223372032559808513,umax_value=18446744069414584321,var_off=(0x0; 0xffffffff00000001),s32_min_value=0,s32_max_value=1,u32_max_value=1) R10=fp0 9: (bc) w2 = w2 10: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0,umax_value=1,var_off=(0x0; 0x1)) R10=fp0 [...] Thus, same issue as in b02709587ea3 holds for unsigned subregister tracking. Also, align __reg64_bound_u32() similarly to __reg64_bound_s32() as done in b02709587ea3 to make them uniform again. Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: Manfred Paul (@_manfp) Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: John Fastabend <[email protected]> Acked-by: Alexei Starovoitov <[email protected]>
2021-04-27bpf: Lock bpf_trace_printk's tmp buf before it is written toFlorent Revest1-1/+1
bpf_trace_printk uses a shared static buffer to hold strings before they are printed. A recent refactoring moved the locking of that buffer after it gets filled by mistake. Fixes: d9c9e4db186a ("bpf: Factorize bpf_trace_printk and bpf_seq_printf") Reported-by: Rasmus Villemoes <[email protected]> Signed-off-by: Florent Revest <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-04-26Merge tag 'pm-5.13-rc1' of ↵Linus Torvalds6-21/+21
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add some new hardware support (for example, IceLake-D idle states in intel_idle), fix some issues (for example, the handling of negative "sleep length" values in cpuidle governors), add new functionality to the existing drivers (for example, scale-invariance support in the ACPI CPPC cpufreq driver) and clean up code all over. Specifics: - Add idle states table for IceLake-D to the intel_idle driver and update IceLake-X C6 data in it (Artem Bityutskiy). - Fix the C7 idle state on Tegra114 in the tegra cpuidle driver and drop the unused do_idle() firmware call from it (Dmitry Osipenko). - Fix cpuidle-qcom-spm Kconfig entry (He Ying). - Fix handling of possible negative tick_nohz_get_next_hrtimer() return values of in cpuidle governors (Rafael Wysocki). - Add support for frequency-invariance to the ACPI CPPC cpufreq driver and update the frequency-invariance engine (FIE) to use it as needed (Viresh Kumar). - Simplify the default delay_us setting in the ACPI CPPC cpufreq driver (Tom Saeger). - Clean up frequency-related computations in the intel_pstate cpufreq driver (Rafael Wysocki). - Fix TBG parent setting for load levels in the armada-37xx cpufreq driver and drop the CPU PM clock .set_parent method for armada-37xx (Marek Behún). - Fix multiple issues in the armada-37xx cpufreq driver (Pali Rohár). - Fix handling of dev_pm_opp_of_cpumask_add_table() return values in cpufreq-dt to take the -EPROBE_DEFER one into acconut as appropriate (Quanyang Wang). - Fix format string in ia64-acpi-cpufreq (Sergei Trofimovich). - Drop the unused for_each_policy() macro from cpufreq (Shaokun Zhang). - Simplify computations in the schedutil cpufreq governor to avoid unnecessary overhead (Yue Hu). - Fix typos in the s5pv210 cpufreq driver (Bhaskar Chowdhury). - Fix cpufreq documentation links in Kconfig (Alexander Monakov). - Fix PCI device power state handling in pci_enable_device_flags() to avoid issuse in some cases when the device depends on an ACPI power resource (Rafael Wysocki). - Add missing documentation of pm_runtime_resume_and_get() (Alan Stern). - Add missing static inline stub for pm_runtime_has_no_callbacks() to pm_runtime.h and drop the unused try_to_freeze_nowarn() definition (YueHaibing). - Drop duplicate struct device declaration from pm.h and fix a structure type declaration in intel_rapl.h (Wan Jiabing). - Use dev_set_name() instead of an open-coded equivalent of it in the wakeup sources code and drop a redundant local variable initialization from it (Andy Shevchenko, Colin Ian King). - Use crc32 instead of md5 for e820 memory map integrity check during resume from hibernation on x86 (Chris von Recklinghausen). - Fix typos in comments in the system-wide and hibernation support code (Lu Jialin). - Modify the generic power domains (genpd) code to avoid resuming devices in the "prepare" phase of system-wide suspend and hibernation (Ulf Hansson). - Add Hygon Fam18h RAPL support to the intel_rapl power capping driver (Pu Wen). - Add MAINTAINERS entry for the dynamic thermal power management (DTPM) code (Daniel Lezcano). - Add devm variants of operating performance points (OPP) API functions and switch over some users of the OPP framework to the new resource-managed API (Yangtao Li and Dmitry Osipenko). - Update devfreq core: * Register devfreq devices as cooling devices on demand (Daniel Lezcano). * Add missing unlock opeation in devfreq_add_device() (Lukasz Luba). * Use the next frequency as resume_freq instead of the previous frequency when using the opp-suspend property (Dong Aisheng). * Check get_dev_status in devfreq_update_stats() (Dong Aisheng). * Fix set_freq path for the userspace governor in Kconfig (Dong Aisheng). * Remove invalid description of get_target_freq() (Dong Aisheng). - Update devfreq drivers: * imx8m-ddrc: Remove imx8m_ddrc_get_dev_status() and unneeded of_match_ptr() (Dong Aisheng, Fabio Estevam). * rk3399_dmc: dt-bindings: Add rockchip,pmu phandle and drop references to undefined symbols (Enric Balletbo i Serra, Gaël PORTAY). * rk3399_dmc: Use dev_err_probe() to simplify the code (Krzysztof Kozlowski). * imx-bus: Remove unneeded of_match_ptr() (Fabio Estevam). - Fix kernel-doc warnings in three places (Pierre-Louis Bossart). - Fix typo in the pm-graph utility code (Ricardo Ribalda)" * tag 'pm-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits) PM: wakeup: remove redundant assignment to variable retval PM: hibernate: x86: Use crc32 instead of md5 for hibernation e820 integrity check cpufreq: Kconfig: fix documentation links PM: wakeup: use dev_set_name() directly PM: runtime: Add documentation for pm_runtime_resume_and_get() cpufreq: intel_pstate: Simplify intel_pstate_update_perf_limits() cpufreq: armada-37xx: Fix module unloading cpufreq: armada-37xx: Remove cur_frequency variable cpufreq: armada-37xx: Fix determining base CPU frequency cpufreq: armada-37xx: Fix driver cleanup when registration failed clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0 clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz cpufreq: armada-37xx: Fix the AVS value for load L1 clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock cpufreq: armada-37xx: Fix setting TBG parent for load levels cpuidle: Fix ARM_QCOM_SPM_CPUIDLE configuration cpuidle: tegra: Remove do_idle firmware call cpuidle: tegra: Fix C7 idling state on Tegra114 PM: sleep: fix typos in comments cpufreq: Remove unused for_each_policy macro ...
2021-04-26Merge tag 'arm64-upstream' of ↵Linus Torvalds1-0/+16
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - MTE asynchronous support for KASan. Previously only synchronous (slower) mode was supported. Asynchronous is faster but does not allow precise identification of the illegal access. - Run kernel mode SIMD with softirqs disabled. This allows using NEON in softirq context for crypto performance improvements. The conditional yield support is modified to take softirqs into account and reduce the latency. - Preparatory patches for Apple M1: handle CPUs that only have the VHE mode available (host kernel running at EL2), add FIQ support. - arm64 perf updates: support for HiSilicon PA and SLLC PMU drivers, new functions for the HiSilicon HHA and L3C PMU, cleanups. - Re-introduce support for execute-only user permissions but only when the EPAN (Enhanced Privileged Access Never) architecture feature is available. - Disable fine-grained traps at boot and improve the documented boot requirements. - Support CONFIG_KASAN_VMALLOC on arm64 (only with KASAN_GENERIC). - Add hierarchical eXecute Never permissions for all page tables. - Add arm64 prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) allowing user programs to control which PAC keys are enabled in a particular task. - arm64 kselftests for BTI and some improvements to the MTE tests. - Minor improvements to the compat vdso and sigpage. - Miscellaneous cleanups. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (86 commits) arm64/sve: Add compile time checks for SVE hooks in generic functions arm64/kernel/probes: Use BUG_ON instead of if condition followed by BUG. arm64: pac: Optimize kernel entry/exit key installation code paths arm64: Introduce prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) arm64: mte: make the per-task SCTLR_EL1 field usable elsewhere arm64/sve: Remove redundant system_supports_sve() tests arm64: fpsimd: run kernel mode NEON with softirqs disabled arm64: assembler: introduce wxN aliases for wN registers arm64: assembler: remove conditional NEON yield macros kasan, arm64: tests supports for HW_TAGS async mode arm64: mte: Report async tag faults before suspend arm64: mte: Enable async tag check fault arm64: mte: Conditionally compile mte_enable_kernel_*() arm64: mte: Enable TCO in functions that can read beyond buffer limits kasan: Add report for async mode arm64: mte: Drop arch_enable_tagging() kasan: Add KASAN mode kernel parameter arm64: mte: Add asynchronous mode support arm64: Get rid of CONFIG_ARM64_VHE arm64: Cope with CPUs stuck in VHE mode ...
2021-04-26Merge tag 'timers-core-2021-04-26' of ↵Linus Torvalds19-62/+78
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The time and timers updates contain: Core changes: - Allow runtime power management when the clocksource is changed. - A correctness fix for clock_adjtime32() so that the return value on success is not overwritten by the result of the copy to user. - Allow late installment of broadcast clockevent devices which was broken because nothing switched them over to oneshot mode. This went unnoticed so far because clockevent devices used to be built in, but now people started to make them modular. - Debugfs related simplifications - Small cleanups and improvements here and there Driver changes: - The usual set of device tree binding updates for a wide range of drivers/devices. - The usual updates and improvements for drivers all over the place but nothing outstanding. - No new clocksource/event drivers. They'll come back next time" * tag 'timers-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) posix-timers: Preserve return value in clock_adjtime32() tick/broadcast: Allow late registered device to enter oneshot mode tick: Use tick_check_replacement() instead of open coding it time/timecounter: Mark 1st argument of timecounter_cyc2time() as const dt-bindings: timer: nuvoton,npcm7xx: Add wpcm450-timer clocksource/drivers/arm_arch_timer: Add __ro_after_init and __init clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940 clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue clocksource/drivers/dw_apb_timer_of: Add handling for potential memory leak clocksource/drivers/npcm: Add support for WPCM450 clocksource/drivers/sh_cmt: Don't use CMTOUT_IE with R-Car Gen2/3 clocksource/drivers/pistachio: Fix trivial typo clocksource/drivers/ingenic_ost: Fix return value check in ingenic_ost_probe() clocksource/drivers/timer-ti-dm: Add missing set_state_oneshot_stopped clocksource/drivers/timer-ti-dm: Fix posted mode status check order dt-bindings: timer: renesas,cmt: Document R8A77961 dt-bindings: timer: renesas,cmt: Add r8a779a0 CMT support clocksource/drivers/ingenic-ost: Add support for the JZ4760B clocksource/drivers/ingenic: Add support for the JZ4760 dt-bindings: timer: ingenic: Add compatible strings for JZ4760(B) ...
2021-04-26Merge tag 'irq-core-2021-04-26' of ↵Linus Torvalds19-157/+360
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "The usual updates from the irq departement: Core changes: - Provide IRQF_NO_AUTOEN as a flag for request*_irq() so drivers can be cleaned up which either use a seperate mechanism to prevent auto-enable at request time or have a racy mechanism which disables the interrupt right after request. - Get rid of the last usage of irq_create_identity_mapping() and remove the interface. - An overhaul of tasklet_disable(). Most usage sites of tasklet_disable() are in task context and usually in cleanup, teardown code pathes. tasklet_disable() spinwaits for a tasklet which is currently executed. That's not only a problem for PREEMPT_RT where this can lead to a live lock when the disabling task preempts the softirq thread. It's also problematic in context of virtualization when the vCPU which runs the tasklet is scheduled out and the disabling code has to spin wait until it's scheduled back in. There are a few code pathes which invoke tasklet_disable() from non-sleepable context. For these a new disable variant which still spinwaits is provided which allows to switch tasklet_disable() to a sleep wait mechanism. For the atomic use cases this does not solve the live lock issue on PREEMPT_RT. That is mitigated by blocking on the RT specific softirq lock. - The PREEMPT_RT specific implementation of softirq processing and local_bh_disable/enable(). On RT enabled kernels soft interrupt processing happens always in task context and all interrupt handlers, which are not explicitly marked to be invoked in hard interrupt context are forced into task context as well. This allows to protect against softirq processing with a per CPU lock, which in turn allows to make BH disabled regions preemptible. Most of the softirq handling code is still shared. The RT/non-RT specific differences are addressed with a set of inline functions which provide the context specific functionality. The local_bh_disable() / local_bh_enable() mechanism are obviously seperate. - The usual set of small improvements and cleanups Driver changes: - New drivers for Nuvoton WPCM450 and DT 79rc3243x interrupt controllers - Extended functionality for MStar, STM32 and SC7280 irq chips - Enhanced robustness for ARM GICv3/4.1 drivers - The usual set of cleanups and improvements all over the place" * tag 'irq-core-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) irqchip/xilinx: Expose Kconfig option for Zynq/ZynqMP irqchip/gic-v3: Do not enable irqs when handling spurious interrups dt-bindings: interrupt-controller: Add IDT 79RC3243x Interrupt Controller irqchip: Add support for IDT 79rc3243x interrupt controller irqdomain: Drop references to recusive irqdomain setup irqdomain: Get rid of irq_create_strict_mappings() irqchip/jcore-aic: Kill use of irq_create_strict_mappings() ARM: PXA: Kill use of irq_create_strict_mappings() irqchip/gic-v4.1: Disable vSGI upon (GIC CPUIF < v4.1) detection irqchip/tb10x: Use 'fallthrough' to eliminate a warning genirq: Reduce irqdebug cacheline bouncing kernel: Initialize cpumask before parsing irqchip/wpcm450: Drop COMPILE_TEST irqchip/irq-mst: Support polarity configuration irqchip: Add driver for WPCM450 interrupt controller dt-bindings: interrupt-controller: Add nuvoton, wpcm450-aic dt-bindings: qcom,pdc: Add compatible for sc7280 irqchip/stm32: Add usart instances exti direct event support irqchip/gic-v3: Fix OF_BAD_ADDR error handling irqchip/sifive-plic: Mark two global variables __ro_after_init ...
2021-04-26Merge tag 'core-entry-2021-04-26' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core entry updates from Thomas Gleixner: "A trivial cleanup of typo fixes" * tag 'core-entry-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: entry: Fix typos in comments
2021-04-26Merge tag 'tomoyo-pr-20210426' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1Linus Torvalds2-5/+5
Pull lockdep capacity limit updates from Tetsuo Handa: "syzbot is occasionally reporting that fuzz testing is terminated due to hitting upper limits lockdep can track. Analysis via /proc/lockdep* did not show any obvious culprits, allow tuning tracing capacity constants" * tag 'tomoyo-pr-20210426' of git://git.osdn.net/gitroot/tomoyo/tomoyo-test1: lockdep: Allow tuning tracing capacity constants.
2021-04-26Merge branches 'pm-core', 'pm-pci', 'pm-sleep', 'pm-domains' and 'powercap'Rafael J. Wysocki3-3/+3
* pm-core: PM: runtime: Add documentation for pm_runtime_resume_and_get() PM: runtime: Replace inline function pm_runtime_callbacks_present() PM: core: Remove duplicate declaration from header file * pm-pci: PCI: PM: Do not read power state in pci_enable_device_flags() * pm-sleep: PM: wakeup: remove redundant assignment to variable retval PM: hibernate: x86: Use crc32 instead of md5 for hibernation e820 integrity check PM: wakeup: use dev_set_name() directly PM: sleep: fix typos in comments freezer: Remove unused inline function try_to_freeze_nowarn() * pm-domains: PM: domains: Don't runtime resume devices at genpd_prepare() * powercap: powercap: RAPL: Fix struct declaration in header file MAINTAINERS: Add DTPM subsystem maintainer powercap: Add Hygon Fam18h RAPL support
2021-04-26Merge branch 'pm-cpufreq'Rafael J. Wysocki2-17/+13
* pm-cpufreq: (22 commits) cpufreq: Kconfig: fix documentation links cpufreq: intel_pstate: Simplify intel_pstate_update_perf_limits() cpufreq: armada-37xx: Fix module unloading cpufreq: armada-37xx: Remove cur_frequency variable cpufreq: armada-37xx: Fix determining base CPU frequency cpufreq: armada-37xx: Fix driver cleanup when registration failed clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0 clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz cpufreq: armada-37xx: Fix the AVS value for load L1 clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock cpufreq: armada-37xx: Fix setting TBG parent for load levels cpufreq: Remove unused for_each_policy macro cpufreq: dt: dev_pm_opp_of_cpumask_add_table() may return -EPROBE_DEFER cpufreq: intel_pstate: Clean up frequency computations cpufreq: cppc: simplify default delay_us setting cpufreq: Rudimentary typos fix in the file s5pv210-cpufreq.c cpufreq: CPPC: Add support for frequency invariance ia64: fix format string for ia64-acpi-cpu-freq cpufreq: schedutil: Call sugov_update_next_freq() before check to fast_switch_enabled arch_topology: Export arch_freq_scale and helpers ...
2021-04-25bpf: Allow trampoline re-attach for tracing and lsm programsJiri Olsa2-8/+19
Currently we don't allow re-attaching of trampolines. Once it's detached, it can't be re-attach even when the program is still loaded. Adding the possibility to re-attach the loaded tracing and lsm programs. Fixing missing unlock with proper cleanup goto jump reported by Julia. Reported-by: kernel test robot <[email protected]> Reported-by: Julia Lawall <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Toke Høiland-Jørgensen <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: KP Singh <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller6-340/+435
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-04-23 The following pull-request contains BPF updates for your *net-next* tree. We've added 69 non-merge commits during the last 22 day(s) which contain a total of 69 files changed, 3141 insertions(+), 866 deletions(-). The main changes are: 1) Add BPF static linker support for extern resolution of global, from Andrii. 2) Refine retval for bpf_get_task_stack helper, from Dave. 3) Add a bpf_snprintf helper, from Florent. 4) A bunch of miscellaneous improvements from many developers. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-04-25kernel: always initialize task->pf_io_worker to NULLStefan Metzmacher1-0/+1
Otherwise io_wq_worker_{running,sleeping}() may dereference an invalid pointer (in future). Currently all users of create_io_thread() are fine and get task->pf_io_worker = NULL implicitly from the wq_manager, which got it either from the userspace thread of the sq_thread, which explicitly reset it to NULL. I think it's safer to always reset it in order to avoid future problems. Fixes: 3bfe6106693b ("io-wq: fork worker threads from original task") cc: Jens Axboe <[email protected]> Signed-off-by: Stefan Metzmacher <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-04-25Merge tag 'locking_urgent_for_v5.12' of ↵Linus Torvalds1-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: "Fix ordering in the queued writer lock's slowpath" * tag 'locking_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/qrwlock: Fix ordering in queued_write_lock_slowpath()
2021-04-25Merge tag 'sched_urgent_for_v5.12' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Borislav Petkov: "Fix a typo in a macro ifdeffery" * tag 'sched_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: preempt/dynamic: Fix typo in macro conditional statement
2021-04-25kbuild: redo fake deps at include/config/*.hAlexey Dobriyan1-1/+1
Make include/config/foo/bar.h fake deps files generation simpler. * delete .h suffix those aren't header files, shorten filenames, * delete tolower() Linux filesystems can deal with both upper and lowercase filenames very well, * put everything in 1 directory Presumably 'mkdir -p' split is from dark times when filesystems handled huge directories badly, disks were round adding to seek times. x86_64 allmodconfig lists 12364 files in include/config. ../obj/include/config/ ├── 104_QUAD_8 ├── 60XX_WDT ├── 64BIT ... ├── ZSWAP_DEFAULT_ON ├── ZSWAP_ZPOOL_DEFAULT └── ZSWAP_ZPOOL_DEFAULT_ZBUD 0 directories, 12364 files Signed-off-by: Alexey Dobriyan <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2021-04-24Merge tag 'irqchip-5.13' of ↵Thomas Gleixner1-46/+6
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip and irqdomain updates from Marc Zyngier: New HW support: - New driver for the Nuvoton WPCM450 interrupt controller - New driver for the IDT 79rc3243x interrupt controller - Add support for interrupt trigger configuration to the MStar irqchip - Add more external interrupt support to the STM32 irqchip - Add new compatible strings for QCOM SC7280 to the qcom-pdc binding Fixes and cleanups: - Drop irq_create_strict_mappings() and irq_create_identity_mapping() from the irqdomain API, with cleanups in a couple of drivers - Fix nested NMI issue with spurious interrupts on GICv3 - Don't allow GICv4.1 vSGIs when the CPU doesn't support them - Various cleanups and minor fixes Link: https://lore.kernel.org/r/[email protected]
2021-04-23bpf: Remove unnecessary map checks for ARG_PTR_TO_CONST_STRFlorent Revest1-2/+1
reg->type is enforced by check_reg_type() and map should never be NULL (it would already have been dereferenced anyway) so these checks are unnecessary. Reported-by: Alexei Starovoitov <[email protected]> Signed-off-by: Florent Revest <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-04-23bpf: Notify user if we ever hit a bpf_snprintf verifier bugFlorent Revest1-2/+4
In check_bpf_snprintf_call(), a map_direct_value_addr() of the fmt map should never fail because it has already been checked by ARG_PTR_TO_CONST_STR. But if it ever fails, it's better to error out with an explicit debug message rather than silently fail. Reported-by: Alexei Starovoitov <[email protected]> Signed-off-by: Florent Revest <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-04-23Merge tag 'kvmarm-5.13' of ↵Paolo Bonzini3-5/+3
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for Linux 5.13 New features: - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler - Alexandru is now a reviewer (not really a new feature...) Fixes: - Proper emulation of the GICR_TYPER register - Handle the complete set of relocation in the nVHE EL2 object - Get rid of the oprofile dependency in the PMU code (and of the oprofile body parts at the same time) - Debug and SPE fixes - Fix vcpu reset
2021-04-23signal, perf: Add missing TRAP_PERF case in siginfo_layout()Marco Elver1-0/+2
Add the missing TRAP_PERF case in siginfo_layout() for interpreting the layout correctly as SIL_PERF_EVENT instead of just SIL_FAULT. This ensures the si_perf field is copied and not just the si_addr field. This was caught and tested by running the perf_events/sigtrap_threads kselftest as a 32-bit binary with a 64-bit kernel. Fixes: fb6cc127e0b6 ("signal: Introduce TRAP_PERF si_code and si_perf to siginfo") Signed-off-by: Marco Elver <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-04-22landlock: Add syscall implementationsMickaël Salaün1-0/+5
These 3 system calls are designed to be used by unprivileged processes to sandbox themselves: * landlock_create_ruleset(2): Creates a ruleset and returns its file descriptor. * landlock_add_rule(2): Adds a rule (e.g. file hierarchy access) to a ruleset, identified by the dedicated file descriptor. * landlock_restrict_self(2): Enforces a ruleset on the calling thread and its future children (similar to seccomp). This syscall has the same usage restrictions as seccomp(2): the caller must have the no_new_privs attribute set or have CAP_SYS_ADMIN in the current user namespace. All these syscalls have a "flags" argument (not currently used) to enable extensibility. Here are the motivations for these new syscalls: * A sandboxed process may not have access to file systems, including /dev, /sys or /proc, but it should still be able to add more restrictions to itself. * Neither prctl(2) nor seccomp(2) (which was used in a previous version) fit well with the current definition of a Landlock security policy. All passed structs (attributes) are checked at build time to ensure that they don't contain holes and that they are aligned the same way for each architecture. See the user and kernel documentation for more details (provided by a following commit): * Documentation/userspace-api/landlock.rst * Documentation/security/landlock.rst Cc: Arnd Bergmann <[email protected]> Cc: James Morris <[email protected]> Cc: Jann Horn <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Mickaël Salaün <[email protected]> Acked-by: Serge Hallyn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: James Morris <[email protected]>
2021-04-22irqdomain: Drop references to recusive irqdomain setupMarc Zyngier1-6/+2
It was never completely implemented, and was removed a long time ago. Adjust the documentation to reflect this. Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-04-22irqdomain: Get rid of irq_create_strict_mappings()Marc Zyngier1-32/+0
No user of this helper is left, remove it. Signed-off-by: Marc Zyngier <[email protected]>
2021-04-22Merge branch 'kvm-arm64/kill_oprofile_dependency' into kvmarm-master/nextMarc Zyngier1-5/+0
Signed-off-by: Marc Zyngier <[email protected]>
2021-04-22kcsan: Fix printk format stringArnd Bergmann1-1/+1
Printing a 'long' variable using the '%d' format string is wrong and causes a warning from gcc: kernel/kcsan/kcsan_test.c: In function 'nthreads_gen_params': include/linux/kern_levels.h:5:25: error: format '%d' expects argument of type 'int', but argument 3 has type 'long int' [-Werror=format=] Use the appropriate format modifier. Fixes: f6a149140321 ("kcsan: Switch to KUNIT_CASE_PARAM for parameterized tests") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Marco Elver <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-04-22perf: Get rid of oprofile leftoversMarc Zyngier1-5/+0
perf_pmu_name() and perf_num_counters() are unused. Drop them. Signed-off-by: Marc Zyngier <[email protected]> Acked-by: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-04-21cpumask/hotplug: Fix cpu_dying() state trackingPeter Zijlstra1-5/+11
Vincent reported that for states with a NULL startup/teardown function we do not call cpuhp_invoke_callback() (because there is none) and as such we'll not update the cpu_dying() state. The stale cpu_dying() can eventually lead to triggering BUG(). Rectify this by updating cpu_dying() in the exact same places the hotplug machinery tracks its directional state, namely cpuhp_set_state() and cpuhp_reset_state(). Reported-by: Vincent Donnefort <[email protected]> Suggested-by: Vincent Donnefort <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Vincent Donnefort <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-04-21kthread: Fix PF_KTHREAD vs to_kthread() racePeter Zijlstra3-8/+29
The kthread_is_per_cpu() construct relies on only being called on PF_KTHREAD tasks (per the WARN in to_kthread). This gives rise to the following usage pattern: if ((p->flags & PF_KTHREAD) && kthread_is_per_cpu(p)) However, as reported by syzcaller, this is broken. The scenario is: CPU0 CPU1 (running p) (p->flags & PF_KTHREAD) // true begin_new_exec() me->flags &= ~(PF_KTHREAD|...); kthread_is_per_cpu(p) to_kthread(p) WARN(!(p->flags & PF_KTHREAD) <-- *SPLAT* Introduce __to_kthread() that omits the WARN and is sure to check both values. Use this to remove the problematic pattern for kthread_is_per_cpu() and fix a number of other kthread_*() functions that have similar issues but are currently not used in ways that would expose the problem. Notably kthread_func() is only ever called on 'current', while kthread_probe_data() is only used for PF_WQ_WORKER, which implies the task is from kthread_create*(). Fixes: ac687e6e8c26 ("kthread: Extract KTHREAD_IS_PER_CPU") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-04-21sched/debug: Fix cgroup_path[] serializationWaiman Long1-13/+29
The handling of sysrq key can be activated by echoing the key to /proc/sysrq-trigger or via the magic key sequence typed into a terminal that is connected to the system in some way (serial, USB or other mean). In the former case, the handling is done in a user context. In the latter case, it is likely to be in an interrupt context. Currently in print_cpu() of kernel/sched/debug.c, sched_debug_lock is taken with interrupt disabled for the whole duration of the calls to print_*_stats() and print_rq() which could last for the quite some time if the information dump happens on the serial console. If the system has many cpus and the sched_debug_lock is somehow busy (e.g. parallel sysrq-t), the system may hit a hard lockup panic depending on the actually serial console implementation of the system. The purpose of sched_debug_lock is to serialize the use of the global cgroup_path[] buffer in print_cpu(). The rests of the printk calls don't need serialization from sched_debug_lock. Calling printk() with interrupt disabled can still be problematic if multiple instances are running. Allocating a stack buffer of PATH_MAX bytes is not feasible because of the limited size of the kernel stack. The solution implemented in this patch is to allow only one caller at a time to use the full size group_path[], while other simultaneous callers will have to use shorter stack buffers with the possibility of path name truncation. A "..." suffix will be printed if truncation may have happened. The cgroup path name is provided for informational purpose only, so occasional path name truncation should not be a big problem. Fixes: efe25c2c7b3a ("sched: Reinstate group names in /proc/sched_debug") Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-04-21sched,psi: Handle potential task count underflow bugs more gracefullyCharan Teja Reddy1-2/+3
psi_group_cpu->tasks, represented by the unsigned int, stores the number of tasks that could be stalled on a psi resource(io/mem/cpu). Decrementing these counters at zero leads to wrapping which further leads to the psi_group_cpu->state_mask is being set with the respective pressure state. This could result into the unnecessary time sampling for the pressure state thus cause the spurious psi events. This can further lead to wrong actions being taken at the user land based on these psi events. Though psi_bug is set under these conditions but that just for debug purpose. Fix it by decrementing the ->tasks count only when it is non-zero. Signed-off-by: Charan Teja Reddy <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Johannes Weiner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-04-21sched: Warn on long periods of pending need_reschedPaul Turner4-1/+94
CPU scheduler marks need_resched flag to signal a schedule() on a particular CPU. But, schedule() may not happen immediately in cases where the current task is executing in the kernel mode (no preemption state) for extended periods of time. This patch adds a warn_on if need_resched is pending for more than the time specified in sysctl resched_latency_warn_ms. If it goes off, it is likely that there is a missing cond_resched() somewhere. Monitoring is done via the tick and the accuracy is hence limited to jiffy scale. This also means that we won't trigger the warning if the tick is disabled. This feature (LATENCY_WARN) is default disabled. Signed-off-by: Paul Turner <[email protected]> Signed-off-by: Josh Don <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-04-20Merge tag 'trace-v5.12-rc8' of ↵Linus Torvalds1-3/+7
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix tp_printk command line and trace events Masami added a wrapper to be able to unhash trace event pointers as they are only read by root anyway, and they can also be extracted by the raw trace data buffers. But this wrapper utilized the iterator to have a temporary buffer to manipulate the text with. tp_printk is a kernel command line option that will send the trace output of a trace event to the console on boot up (useful when the system crashes before finishing the boot). But the code used the same wrapper that Masami added, and its iterator did not have a buffer, and this caused the system to crash. Have the wrapper just print the trace event normally if the iterator has no temporary buffer" * tag 'trace-v5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix checking event hash pointer logic when tp_printk is enabled
2021-04-20capabilities: require CAP_SETFCAP to map uid 0Serge E. Hallyn1-3/+62
cap_setfcap is required to create file capabilities. Since commit 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a process running as uid 0 but without cap_setfcap is able to work around this as follows: unshare a new user namespace which maps parent uid 0 into the child namespace. While this task will not have new capabilities against the parent namespace, there is a loophole due to the way namespaced file capabilities are represented as xattrs. File capabilities valid in userns 1 are distinguished from file capabilities valid in userns 2 by the kuid which underlies uid 0. Therefore the restricted root process can unshare a new self-mapping namespace, add a namespaced file capability onto a file, then use that file capability in the parent namespace. To prevent that, do not allow mapping parent uid 0 if the process which opened the uid_map file does not have CAP_SETFCAP, which is the capability for setting file capabilities. As a further wrinkle: a task can unshare its user namespace, then open its uid_map file itself, and map (only) its own uid. In this case we do not have the credential from before unshare, which was potentially more restricted. So, when creating a user namespace, we record whether the creator had CAP_SETFCAP. Then we can use that during map_write(). With this patch: 1. Unprivileged user can still unshare -Ur ubuntu@caps:~$ unshare -Ur root@caps:~# logout 2. Root user can still unshare -Ur ubuntu@caps:~$ sudo bash root@caps:/home/ubuntu# unshare -Ur root@caps:/home/ubuntu# logout 3. Root user without CAP_SETFCAP cannot unshare -Ur: root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap -- root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap unable to set CAP_SETFCAP effective capability: Operation not permitted root@caps:/home/ubuntu# unshare -Ur unshare: write failed /proc/self/uid_map: Operation not permitted Note: an alternative solution would be to allow uid 0 mappings by processes without CAP_SETFCAP, but to prevent such a namespace from writing any file capabilities. This approach can be seen at [1]. Background history: commit 95ebabde382 ("capabilities: Don't allow writing ambiguous v3 file capabilities") tried to fix the issue by preventing v3 fscaps to be written to disk when the root uid would map to the same uid in nested user namespaces. This led to regressions for various workloads. For example, see [2]. Ultimately this is a valid use-case we have to support meaning we had to revert this change in 3b0c2d3eaa83 ("Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file capabilities")"). Link: https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4 [1] Link: https://github.com/containers/buildah/issues/3071 [2] Signed-off-by: Serge Hallyn <[email protected]> Reviewed-by: Andrew G. Morgan <[email protected]> Tested-by: Christian Brauner <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Tested-by: Giuseppe Scrivano <[email protected]> Cc: Eric Biederman <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-20tracing: Fix checking event hash pointer logic when tp_printk is enabledSteven Rostedt (VMware)1-3/+7
Pointers in events that are printed are unhashed if the flags allow it, and the logic to do so is called before processing the event output from the raw ring buffer. In most cases, this is done when a user reads one of the trace files. But if tp_printk is added on the kernel command line, this logic is done for trace events when they are triggered, and their output goes out via printk. The unhash logic (and even the validation of the output) did not support the tp_printk output, and would crash. Link: https://lore.kernel.org/linux-tegra/[email protected]/ Fixes: efbbdaa22bb7 ("tracing: Show real address for trace event arguments") Reported-by: Jon Hunter <[email protected]> Tested-by: Jon Hunter <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-04-20sched/fair: Move update_nohz_stats() to the CONFIG_NO_HZ_COMMON block to ↵YueHaibing1-22/+18
simplify the code & fix an unused function warning When !CONFIG_NO_HZ_COMMON we get this new GCC warning: kernel/sched/fair.c:8398:13: warning: ‘update_nohz_stats’ defined but not used [-Wunused-function] Move update_nohz_stats() to an already existing CONFIG_NO_HZ_COMMON #ifdef block. Beyond fixing the GCC warning, this also simplifies the update_nohz_stats() function. [ mingo: Rewrote the changelog. ] Fixes: 0826530de3cb ("sched/fair: Remove update of blocked load from newidle_balance") Signed-off-by: YueHaibing <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Vincent Guittot <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-04-20Merge tag 'v5.12-rc8' into sched/core, to pick up fixesIngo Molnar36-295/+805
Signed-off-by: Ingo Molnar <[email protected]>
2021-04-19bpf: Refine retval for bpf_get_task_stack helperDave Marchevsky1-0/+1
Verifier can constrain the min/max bounds of bpf_get_task_stack's return value more tightly than the default tnum_unknown. Like bpf_get_stack, return value is num bytes written into a caller-supplied buf, or error, so do_refine_retval_range will work. Signed-off-by: Dave Marchevsky <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-04-19bpf: Add a bpf_snprintf helperFlorent Revest3-0/+93
The implementation takes inspiration from the existing bpf_trace_printk helper but there are a few differences: To allow for a large number of format-specifiers, parameters are provided in an array, like in bpf_seq_printf. Because the output string takes two arguments and the array of parameters also takes two arguments, the format string needs to fit in one argument. Thankfully, ARG_PTR_TO_CONST_STR is guaranteed to point to a zero-terminated read-only map so we don't need a format string length arg. Because the format-string is known at verification time, we also do a first pass of format string validation in the verifier logic. This makes debugging easier. Signed-off-by: Florent Revest <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-04-19bpf: Add a ARG_PTR_TO_CONST_STR argument typeFlorent Revest1-0/+41
This type provides the guarantee that an argument is going to be a const pointer to somewhere in a read-only map value. It also checks that this pointer is followed by a zero character before the end of the map value. Signed-off-by: Florent Revest <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-04-19bpf: Factorize bpf_trace_printk and bpf_seq_printfFlorent Revest2-334/+293
Two helpers (trace_printk and seq_printf) have very similar implementations of format string parsing and a third one is coming (snprintf). To avoid code duplication and make the code easier to maintain, this moves the operations associated with format string parsing (validation and argument sanitization) into one generic function. The implementation of the two existing helpers already drifted quite a bit so unifying them entailed a lot of changes: - bpf_trace_printk always expected fmt[fmt_size] to be the terminating NULL character, this is no longer true, the first 0 is terminating. - bpf_trace_printk now supports %% (which produces the percentage char). - bpf_trace_printk now skips width formating fields. - bpf_trace_printk now supports the X modifier (capital hexadecimal). - bpf_trace_printk now supports %pK, %px, %pB, %pi4, %pI4, %pi6 and %pI6 - argument casting on 32 bit has been simplified into one macro and using an enum instead of obscure int increments. - bpf_seq_printf now uses bpf_trace_copy_string instead of strncpy_from_kernel_nofault and handles the %pks %pus specifiers. - bpf_seq_printf now prints longs correctly on 32 bit architectures. - both were changed to use a global per-cpu tmp buffer instead of one stack buffer for trace_printk and 6 small buffers for seq_printf. - to avoid per-cpu buffer usage conflict, these helpers disable preemption while the per-cpu buffer is in use. - both helpers now support the %ps and %pS specifiers to print symbols. The implementation is also moved from bpf_trace.c to helpers.c because the upcoming bpf_snprintf helper will be made available to all BPF programs and will need it. Signed-off-by: Florent Revest <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-04-19Revert "gcov: clang: fix clang-11+ build"Linus Torvalds1-1/+1
This reverts commit 04c53de57cb6435738961dace8b1b71d3ecd3c39. Nathan Chancellor points out that it should not have been merged into mainline by itself. It was a fix for "gcov: use kvmalloc()", which is still in -mm/-next. Merging it alone has broken the build. Link: https://github.com/ClangBuiltLinux/continuous-integration2/runs/2384465683?check_suite_focus=true Reported-by: Nathan Chancellor <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-04-19perf: Extend PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHEKan Liang1-3/+16
Current Hardware events and Hardware cache events have special perf types, PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE. The two types don't pass the PMU type in the user interface. For a hybrid system, the perf subsystem doesn't know which PMU the events belong to. The first capable PMU will always be assigned to the events. The events never get a chance to run on the other capable PMUs. Extend the two types to become PMU aware types. The PMU type ID is stored at attr.config[63:32]. Add a new PMU capability, PERF_PMU_CAP_EXTENDED_HW_TYPE, to indicate a PMU which supports the extended PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE. The PMU type is only required when searching a specific PMU. The PMU specific codes will only be interested in the 'real' config value, which is stored in the low 32 bit of the event->attr.config. Update the event->attr.config in the generic code, so the PMU specific codes don't need to calculate it separately. If a user specifies a PMU type, but the PMU doesn't support the extended type, error out. If an event cannot be initialized in a PMU specified by a user, error out immediately. Perf should not try to open it on other PMUs. The new PMU capability is only set for the X86 hybrid PMUs for now. Other architectures, e.g., ARM, may need it as well. The support on ARM may be implemented later separately. Suggested-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-04-19preempt/dynamic: Fix typo in macro conditional statementZhouyi Zhou1-1/+1
Commit 40607ee97e4e ("preempt/dynamic: Provide irqentry_exit_cond_resched() static call") tried to provide irqentry_exit_cond_resched() static call in irqentry_exit, but has a typo in macro conditional statement. Fixes: 40607ee97e4e ("preempt/dynamic: Provide irqentry_exit_cond_resched() static call") Signed-off-by: Zhouyi Zhou <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-04-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-89/+183
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c - keep the ZC code, drop the code related to reinit net/bridge/netfilter/ebtables.c - fix build after move to net_generic Signed-off-by: Jakub Kicinski <[email protected]>
2021-04-17Merge tag 'net-5.12-rc8' of ↵Linus Torvalds1-74/+156
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.12-rc8, including fixes from netfilter, and bpf. BPF verifier changes stand out, otherwise things have slowed down. Current release - regressions: - gro: ensure frag0 meets IP header alignment - Revert "net: stmmac: re-init rx buffers when mac resume back" - ethernet: macb: fix the restore of cmp registers Previous releases - regressions: - ixgbe: Fix NULL pointer dereference in ethtool loopback test - ixgbe: fix unbalanced device enable/disable in suspend/resume - phy: marvell: fix detection of PHY on Topaz switches - make tcp_allowed_congestion_control readonly in non-init netns - xen-netback: Check for hotplug-status existence before watching Previous releases - always broken: - bpf: mitigate a speculative oob read of up to map value size by tightening the masking window - sctp: fix race condition in sctp_destroy_sock - sit, ip6_tunnel: Unregister catch-all devices - netfilter: nftables: clone set element expression template - netfilter: flowtable: fix NAT IPv6 offload mangling - net: geneve: check skb is large enough for IPv4/IPv6 header - netlink: don't call ->netlink_bind with table lock held" * tag 'net-5.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits) netlink: don't call ->netlink_bind with table lock held MAINTAINERS: update my email bpf: Update selftests to reflect new error states bpf: Tighten speculative pointer arithmetic mask bpf: Move sanitize_val_alu out of op switch bpf: Refactor and streamline bounds check into helper bpf: Improve verifier error messages for users bpf: Rework ptr_limit into alu_limit and add common error path bpf: Ensure off_reg has no mixed signed bounds for all types bpf: Move off_reg into sanitize_ptr_alu bpf: Use correct permission flag for mixed signed bounds arithmetic ch_ktls: do not send snd_una update to TCB in middle ch_ktls: tcb close causes tls connection failure ch_ktls: fix device connection close ch_ktls: Fix kernel panic i40e: fix the panic when running bpf in xdpdrv mode net/mlx5e: fix ingress_ifindex check in mlx5e_flower_parse_meta net/mlx5e: Fix setting of RS FEC mode net/mlx5: Fix setting of devlink traps in switchdev mode Revert "net: stmmac: re-init rx buffers when mac resume back" ...
2021-04-17posix-timers: Preserve return value in clock_adjtime32()Chen Jun1-2/+2
The return value on success (>= 0) is overwritten by the return value of put_old_timex32(). That works correct in the fault case, but is wrong for the success case where put_old_timex32() returns 0. Just check the return value of put_old_timex32() and return -EFAULT in case it is not zero. [ tglx: Massage changelog ] Fixes: 3a4d44b61625 ("ntp: Move adjtimex related compat syscalls to native counterparts") Signed-off-by: Chen Jun <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Richard Cochran <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2021-04-17locking/qrwlock: Fix ordering in queued_write_lock_slowpath()Ali Saidi1-3/+4
While this code is executed with the wait_lock held, a reader can acquire the lock without holding wait_lock. The writer side loops checking the value with the atomic_cond_read_acquire(), but only truly acquires the lock when the compare-and-exchange is completed successfully which isn’t ordered. This exposes the window between the acquire and the cmpxchg to an A-B-A problem which allows reads following the lock acquisition to observe values speculatively before the write lock is truly acquired. We've seen a problem in epoll where the reader does a xchg while holding the read lock, but the writer can see a value change out from under it. Writer | Reader -------------------------------------------------------------------------------- ep_scan_ready_list() | |- write_lock_irq() | |- queued_write_lock_slowpath() | |- atomic_cond_read_acquire() | | read_lock_irqsave(&ep->lock, flags); --> (observes value before unlock) | chain_epi_lockless() | | epi->next = xchg(&ep->ovflist, epi); | | read_unlock_irqrestore(&ep->lock, flags); | | | atomic_cmpxchg_relaxed() | |-- READ_ONCE(ep->ovflist); | A core can order the read of the ovflist ahead of the atomic_cmpxchg_relaxed(). Switching the cmpxchg to use acquire semantics addresses this issue at which point the atomic_cond_read can be switched to use relaxed semantics. Fixes: b519b56e378ee ("locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock") Signed-off-by: Ali Saidi <[email protected]> [peterz: use try_cmpxchg()] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Steve Capper <[email protected]> Acked-by: Will Deacon <[email protected]> Acked-by: Waiman Long <[email protected]> Tested-by: Steve Capper <[email protected]>