Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 confidential computing updates from Borislav Petkov:
"Unrelated x86/cc changes queued here to avoid ugly cross-merges and
conflicts:
- Carve out CPU hotplug function declarations into a separate header
with the goal to be able to use the lockdep assertions in a more
flexible manner
- As a result, refactor cacheinfo code after carving out a function
to return the cache ID associated with a given cache level
- Cleanups
Add support to be able to kexec TDX guests:
- Expand ACPI MADT CPU offlining support
- Add machinery to prepare CoCo guests memory before kexec-ing into a
new kernel
- Cleanup, readjust and massage related code"
* tag 'x86_cc_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
ACPI: tables: Print MULTIPROC_WAKEUP when MADT is parsed
x86/acpi: Add support for CPU offlining for ACPI MADT wakeup method
x86/mm: Introduce kernel_ident_mapping_free()
x86/smp: Add smp_ops.stop_this_cpu() callback
x86/acpi: Do not attempt to bring up secondary CPUs in the kexec case
x86/acpi: Rename fields in the acpi_madt_multiproc_wakeup structure
x86/mm: Do not zap page table entries mapping unaccepted memory table during kdump
x86/mm: Make e820__end_ram_pfn() cover E820_TYPE_ACPI ranges
x86/tdx: Convert shared memory back to private on kexec
x86/mm: Add callbacks to prepare encrypted memory for kexec
x86/tdx: Account shared memory
x86/mm: Return correct level from lookup_address() if pte is none
x86/mm: Make x86_platform.guest.enc_status_change_*() return an error
x86/kexec: Keep CR4.MCE set during kexec for TDX guest
x86/relocate_kernel: Use named labels for less confusion
cpu/hotplug, x86/acpi: Disable CPU offlining for ACPI MADT wakeup
cpu/hotplug: Add support for declaring CPU offlining not supported
x86/apic: Mark acpi_mp_wake_* variables as __ro_after_init
x86/acpi: Extract ACPI MADT wakeup code into a separate file
x86/kexec: Remove spurious unconditional JMP from from identity_mapped()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:
"The majority of added lines are two new modules: the GPIO virtual
consumer module that improves our ability to add automated tests for
the kernel API and the "sloppy" logic analyzer module that uses the
GPIO API to implement a coarse-grained debugging tool for useful for
remote development.
Other than that we have the usual assortment of various driver
extensions, improvements to the core GPIO code, DT-bindings and other
documentation updates as well as an extension to the interrupt
simulator:
GPIOLIB core:
- rework kfifo handling rework in the character device code
- improve the labeling of GPIOs requested as interrupts and show more
info on interrupt-only GPIOs in debugfs
- remove unused APIs
- unexport interfaces that are only used from the core GPIO code
- drop the return value from gpiochip_set_desc_names() as it cannot
fail
- move a string array definition out of a header and into a specific
compilation unit
- convert the last user of gpiochip_get_desc() other than GPIO core
to using a safer alternative
- use array_index_nospec() where applicable
New drivers:
- add a "virtual GPIO consumer" module that allows requesting GPIOs
from actual hardware and driving tests of the in-kernel GPIO API
from user-space over debugfs
- add a GPIO-based "sloppy" logic analyzer module useful for "first
glance" debugging on remote boards
Driver improvements:
- add support for a new model to gpio-pca953x
- lock GPIOs as interrupts in gpio-sim when the lines are requested
as irqs via the simulator domain + some other minor improvements
- improve error reporting in gpio-syscon
- convert gpio-ath79 to using dynamic GPIO base and range
- use pcibios_err_to_errno() for converting PCIBIOS error codes to
errno vaues in gpio-amd8111 and gpio-rdc321x
- allow building gpio-brcmstb for the BCM2835 architecture
DT bindings:
- convert DT bindings for lsi,zevio, mpc8xxx, and atmel to DT schema
- document new properties for aspeed,gpio, fsl,qoriq-gpio and
gpio-vf610
- document new compatibles for pca953x and fsl,qoriq-gpio
Documentation:
- document stricter behavior of the GPIO character device uAPI with
regards to reconfiguring requested line without direction set
- clarify the effect of the active-low flag on line values and edges
- remove documentation for the legacy GPIO API in order to stop
tempting people to use it
- document the preference for using pread() for reading edge events
in the sysfs API
Other:
- add an extended initializer to the interrupt simulator allowing to
specify a number of callbacks callers can use to be notified about
irqs being requested and released"
* tag 'gpio-updates-for-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (41 commits)
gpio: mc33880: Convert comma to semicolon
gpio: virtuser: actually use the "trimmed" local variable
dt-bindings: gpio: convert Atmel GPIO to json-schema
gpio: virtuser: new virtual testing driver for the GPIO API
dt-bindings: gpio: vf610: Allow gpio-line-names to be set
gpio: sim: lock GPIOs as interrupts when they are requested
genirq/irq_sim: add an extended irq_sim initializer
dt-bindings: gpio: fsl,qoriq-gpio: Add compatible string fsl,ls1046a-gpio
gpiolib: unexport gpiochip_get_desc()
gpio: add sloppy logic analyzer using polling
Documentation: gpio: Reconfiguration with unset direction (uAPI v2)
Documentation: gpio: Reconfiguration with unset direction (uAPI v1)
dt-bindings: gpio: fsl,qoriq-gpio: add common property gpio-line-names
gpio: ath79: convert to dynamic GPIO base allocation
pinctrl: da9062: replace gpiochip_get_desc() with gpio_device_get_desc()
gpiolib: put gpio_suffixes in a single compilation unit
Documentation: gpio: Clarify effect of active low flag on line edges
Documentation: gpio: Clarify effect of active low flag on line values
gpiolib: Remove data-less gpiochip_add() function
gpio: sim: use devm_mutex_init()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
"cpus_read_lock() was dropped from workqueue creation path but there
were still remaining lockdep_assert_cpus_held() triggering spurious
lockdep failures. Remove them"
* tag 'wq-for-6.11-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Remove unneeded lockdep_assert_cpus_held()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"The biggest part is the virtual CPU hotplug that touches ACPI,
irqchip. We also have some GICv3 optimisation for pseudo-NMIs that has
been queued via the arm64 tree. Otherwise the usual perf updates,
kselftest, various small cleanups.
Core:
- Virtual CPU hotplug support for arm64 ACPI systems
- cpufeature infrastructure cleanups and making the FEAT_ECBHB ID
bits visible to guests
- CPU errata: expand the speculative SSBS workaround to more CPUs
- GICv3, use compile-time PMR values: optimise the way regular IRQs
are masked/unmasked when GICv3 pseudo-NMIs are used, removing the
need for a static key in fast paths by using a priority value
chosen dynamically at boot time
ACPI:
- 'acpi=nospcr' option to disable SPCR as default console for arm64
- Move some ACPI code (cpuidle, FFH) to drivers/acpi/arm64/
Perf updates:
- Rework of the IMX PMU driver to enable support for I.MX95
- Enable support for tertiary match groups in the CMN PMU driver
- Initial refactoring of the CPU PMU code to prepare for the fixed
instruction counter introduced by Arm v9.4
- Add missing PMU driver MODULE_DESCRIPTION() strings
- Hook up DT compatibles for recent CPU PMUs
Kselftest updates:
- Kernel mode NEON fp-stress
- Cleanups, spelling mistakes
Miscellaneous:
- arm64 Documentation update with a minor clarification on TBI
- Fix missing IPI statistics
- Implement raw_smp_processor_id() using thread_info rather than a
per-CPU variable (better code generation)
- Make MTE checking of in-kernel asynchronous tag faults conditional
on KASAN being enabled
- Minor cleanups, typos"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (69 commits)
selftests: arm64: tags: remove the result script
selftests: arm64: tags_test: conform test to TAP output
perf: add missing MODULE_DESCRIPTION() macros
arm64: smp: Fix missing IPI statistics
irqchip/gic-v3: Fix 'broken_rdists' unused warning when !SMP and !ACPI
ACPI: Add acpi=nospcr to disable ACPI SPCR as default console on ARM64
Documentation: arm64: Update memory.rst for TBI
arm64/cpufeature: Replace custom macros with fields from ID_AA64PFR0_EL1
KVM: arm64: Replace custom macros with fields from ID_AA64PFR0_EL1
perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h
perf: arm_v6/7_pmu: Drop non-DT probe support
perf/arm: Move 32-bit PMU drivers to drivers/perf/
perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check
perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold
arm64: Kconfig: Fix dependencies to enable ACPI_HOTPLUG_CPU
perf: imx_perf: add support for i.MX95 platform
perf: imx_perf: fix counter start and config sequence
perf: imx_perf: refactor driver for imx93
perf: imx_perf: let the driver manage the counter usage rather the user
perf: imx_perf: add macro definitions for parsing config attr
...
|
|
The commit 19af45757383 ("workqueue: Remove cpus_read_lock() from
apply_wqattrs_lock()") removes the unneed cpus_read_lock() after the pwq
creations and installations have been reworked based on wq_online_cpumask
rather than cpu_online_mask making cpus_read_lock() is unneeded during
wqattrs changes.
But it desn't remove the lockdep_assert_cpus_held() checks during wqattrs
changes, which leads to complaints from lockdep reported by kernel test
robot:
[ 15.726567][ T131] ------------[ cut here ]------------
[ 15.728117][ T131] WARNING: CPU: 1 PID: 131 at kernel/cpu.c:525 lockdep_assert_cpus_held (kernel/cpu.c:525)
[ 15.731191][ T131] Modules linked in: floppy(+) parport_pc(+) parport qemu_fw_cfg rtc_cmos
[ 15.733423][ T131] CPU: 1 PID: 131 Comm: systemd-udevd Tainted: G T 6.10.0-rc2-00254-g19af45757383 #1 df6f039f42e8818bf9a534449362ebad1aad32e2
[ 15.737011][ T131] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 15.739760][ T131] EIP: lockdep_assert_cpus_held (kernel/cpu.c:525)
[ 15.741326][ T131] Code: 97 c2 03 72 20 83 3d f4 73 97 c2 00 74 17 55 89 e5 b8 fc bd 4d c2 ba ff ff ff ff e8 e4 57 d1 00 85 c0 74 06 5d 31 c0 31 d2 c3 <0f> 0b eb f6 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 b8
Fix it by removing the unneeded lockdep_assert_cpus_held().
Also remove the unneed cpus_read_lock() from wq_affn_dfl_set().
tj: Dropped the removal of cpus_read_lock/unlock() in wq_affn_dfl_set() to
keep this patch fix only.
Cc: kernel test robot <[email protected]>
Fixes: 19af45757383("workqueue: Remove cpus_read_lock() from apply_wqattrs_lock()")
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
Pull workqueue updates from Tejun Heo:
- Lai fixed a bug where CPU hotplug and workqueue attribute changes
race leaving some workqueues not fully updated. This involved
refactoring and changing how online CPUs are tracked. The resulting
code is cleaner.
- Workqueue watchdog touch operation was causing too much cacheline
contention on very large machines. Nicholas improved scalabililty by
avoiding unnecessary global updates.
- Code cleanups and minor rescuer behavior improvement.
- The last commit 58629d4871e8 ("workqueue: Always queue work items to
the newest PWQ for order workqueues") is a cherry-picked straggler
commit from for-6.10-fixes, a fix for a bug which may not actually
trigger.
* tag 'wq-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (24 commits)
workqueue: Always queue work items to the newest PWQ for order workqueues
workqueue: Rename wq_update_pod() to unbound_wq_update_pwq()
workqueue: Remove the arguments @hotplug_cpu and @online from wq_update_pod()
workqueue: Remove the argument @cpu_going_down from wq_calc_pod_cpumask()
workqueue: Remove the unneeded cpumask empty check in wq_calc_pod_cpumask()
workqueue: Remove cpus_read_lock() from apply_wqattrs_lock()
workqueue: Simplify wq_calc_pod_cpumask() with wq_online_cpumask
workqueue: Add wq_online_cpumask
workqueue: Init rescuer's affinities as the wq's effective cpumask
workqueue: Put PWQ allocation and WQ enlistment in the same lock C.S.
workqueue: Move kthread_flush_worker() out of alloc_and_link_pwqs()
workqueue: Make rescuer initialization as the last step of the creation of a new wq
workqueue: Register sysfs after the whole creation of the new wq
workqueue: Simplify goto statement
workqueue: Update cpumasks after only applying it successfully
workqueue: Improve scalability of workqueue watchdog touch
workqueue: wq_watchdog_touch is always called with valid CPU
workqueue: Remove useless pool->dying_workers
workqueue: Detach workers directly in idle_cull_fn()
workqueue: Don't bind the rescuer in the last working cpu
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Added Michal Koutný as a maintainer
- Counters in pids.events were behaving inconsistently. pids.events
made properly hierarchical and pids.events.local added
- misc.peak and misc.events.local added
- cpuset remote partition creation and cpuset.cpus.exclusive handling
improved
- Code cleanups, non-critical fixes, doc updates
- for-6.10-fixes is merged in to receive two non-critical fixes that
didn't trigger pull
* tag 'cgroup-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (23 commits)
cgroup: Add Michal Koutný as a maintainer
cgroup/misc: Introduce misc.events.local
cgroup/rstat: add force idle show helper
cgroup: Protect css->cgroup write under css_set_lock
cgroup/misc: Introduce misc.peak
cgroup_misc: add kernel-doc comments for enum misc_res_type
cgroup/cpuset: Prevent UAF in proc_cpuset_show()
selftest/cgroup: Update test_cpuset_prs.sh to match changes
cgroup/cpuset: Make cpuset.cpus.exclusive independent of cpuset.cpus
cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE until valid partition
selftest/cgroup: Fix test_cpuset_prs.sh problems reported by test robot
cgroup/cpuset: Fix remote root partition creation problem
cgroup: avoid the unnecessary list_add(dying_tasks) in cgroup_exit()
cgroup/cpuset: Optimize isolated partition only generate_sched_domains() calls
cgroup/cpuset: Reduce the lock protecting CS_SCHED_LOAD_BALANCE
kernel/cgroup: cleanup cgroup_base_files when fail to add cgroup_psi_files
selftests: cgroup: Add basic tests for pids controller
selftests: cgroup: Lexicographic order in Makefile
cgroup/pids: Add pids.events.local
cgroup/pids: Make event counters hierarchical
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull KCSAN updates from Paul McKenney:
- improve the documentation for the new __data_racy type qualifier
to the data_race() macro's kernel-doc header and to the LKMM's
access-marking documentation
- add missing MODULE_DESCRIPTION
* tag 'kcsan.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
kcsan: Add missing MODULE_DESCRIPTION() macro
kcsan: Add example to data_race() kerneldoc header
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull torture-test updates from Paul McKenney:
"This adds MODULE_DESCRIPTION() to torture.c, locktorture.c, and
scftorture.c, and also adds 'static' to a global variable that is used
only in scftorture.c"
* tag 'torture.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
scftorture: Make torture_type static
scftorture: Add MODULE_DESCRIPTION()
locktorture: Add MODULE_DESCRIPTION()
torture: Add MODULE_DESCRIPTION()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
- Update Tasks RCU and Tasks Rude RCU description in Requirements.rst
and clarify rcu_assign_pointer() and rcu_dereference() ordering
properties
- Add lockdep assertions for RCU readers, limit inline wakeups for
callback-bypass synchronize_rcu(), add an
rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter, add
Uladzislau Rezki as RCU maintainer, and fix a subtle
callback-migration memory-ordering issue
- Remove a number of redundant memory barriers
- Remove unnecessary bypass-list lock-contention mitigation, use
parking API instead of open-coded ad-hoc equivalent, and upgrade
obsolete comments
- Revert avoidance of a deadlock that can no longer occur and properly
synchronize Tasks Trace RCU checking of runqueues
- Add tests for handling of double-call_rcu() bug, add missing
MODULE_DESCRIPTION, and add a script that histograms the number of
calls to RCU updaters
- Fill out SRCU polled-grace-period API
* tag 'rcu.2024.07.12a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (29 commits)
rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocation
rcu: Eliminate lockless accesses to rcu_sync->gp_count
MAINTAINERS: Add Uladzislau Rezki as RCU maintainer
rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter
rcu/exp: Remove redundant full memory barrier at the end of GP
rcu: Remove full memory barrier on RCU stall printout
rcu: Remove full memory barrier on boot time eqs sanity check
rcu/exp: Remove superfluous full memory barrier upon first EQS snapshot
rcu: Remove superfluous full memory barrier upon first EQS snapshot
rcu: Remove full ordering on second EQS snapshot
srcu: Fill out polled grace-period APIs
srcu: Update cleanup_srcu_struct() comment
srcu: Add NUM_ACTIVE_SRCU_POLL_OLDSTATE
srcu: Disable interrupts directly in srcu_gp_end()
rcu: Disable interrupts directly in rcu_gp_init()
rcu/tree: Reduce wake up for synchronize_rcu() common case
rcu/tasks: Fix stale task snaphot for Tasks Trace
tools/rcu: Add rcu-updaters.sh script
rcutorture: Add missing MODULE_DESCRIPTION() macros
rcutorture: Fix rcu_torture_fwd_cb_cr() data race
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Updates for timers, timekeeping and related functionality:
Core:
- Make the takeover of a hrtimer based broadcast timer reliable
during CPU hot-unplug. The current implementation suffers from a
race which can lead to broadcast timer starvation in the worst
case.
- VDSO related cleanups and simplifications
- Small cleanups and enhancements all over the place
PTP:
- Replace the architecture specific base clock to clocksource, e.g.
ART to TSC, conversion function with generic functionality to avoid
exposing such internals to drivers and convert all existing drivers
over. This also allows to provide functionality which converts the
other way round in the core code based on the same parameter set.
- Provide a function to convert CLOCK_REALTIME to the base clock to
support the upcoming PPS output driver on Intel platforms.
Drivers:
- A set of Device Tree bindings for new hardware
- Cleanups and enhancements all over the place"
* tag 'timers-core-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
clocksource/drivers/realtek: Add timer driver for rtl-otto platforms
dt-bindings: timer: Add schema for realtek,otto-timer
dt-bindings: timer: Add SOPHGO SG2002 clint
dt-bindings: timer: renesas,tmu: Add R-Car Gen2 support
dt-bindings: timer: renesas,tmu: Add RZ/G1 support
dt-bindings: timer: renesas,tmu: Add R-Mobile APE6 support
clocksource/drivers/mips-gic-timer: Correct sched_clock width
clocksource/drivers/mips-gic-timer: Refine rating computation
clocksource/drivers/sh_cmt: Address race condition for clock events
clocksource/driver/arm_global_timer: Remove unnecessary ‘0’ values from err
clocksource/drivers/arm_arch_timer: Remove unnecessary ‘0’ values from irq
tick/broadcast: Make takeover of broadcast hrtimer reliable
tick/sched: Combine WARN_ON_ONCE and print_once
x86/vdso: Remove unused include
x86/vgtod: Remove unused typedef gtod_long_t
x86/vdso: Fix function reference in comment
vdso: Add comment about reason for vdso struct ordering
vdso/gettimeofday: Clarify comment about open coded function
timekeeping: Add missing kernel-doc function comments
tick: Remove unnused tick_nohz_get_idle_calls()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug updates from Thomas Gleixner:
"A small set of SMP/CPU hotplug updates:
- Reverse the order of iteration when freezing secondary CPUs for
hibernation.
This avoids that drivers like the Intel uncore performance counter
have to transfer the assignement of handling the per package uncore
events for every CPU in a package, which is a considerable speedup
on larger systems.
- Add a missing destroy_work_on_stack() invocation in
smp_call_on_cpu() to prevent debug objects to emit a false positive
warning when the stack is freed.
- Small cleanups in comments and a str_plural() conversion"
* tag 'smp-core-2024-07-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smp: Add missing destroy_work_on_stack() call in smp_call_on_cpu()
cpu/hotplug: Reverse order of iteration in freeze_secondary_cpus()
smp: Use str_plural() to fix Coccinelle warnings
cpu/hotplug: Fix typo in comment
|
|
Pull io_uring updates from Jens Axboe:
"Here are the io_uring updates queued up for 6.11.
Nothing major this time around, various minor improvements and
cleanups/fixes. This contains:
- Add bind/listen opcodes. Main motivation is to support direct
descriptors, to avoid needing a regular fd just for doing these two
operations (Gabriel)
- Probe fixes (Gabriel)
- Treat io-wq work flags as atomics. Not fixing a real issue, but may
as well and it silences a KCSAN warning (me)
- Cleanup of rsrc __set_current_state() usage (me)
- Add 64-bit for {m,f}advise operations (me)
- Improve performance of data ring messages (me)
- Fix for ring message overflow posting (Pavel)
- Fix for freezer interaction with TWA_NOTIFY_SIGNAL. Not strictly an
io_uring thing, but since TWA_NOTIFY_SIGNAL was originally added
for faster task_work signaling for io_uring, bundling it with this
pull (Pavel)
- Add Pavel as a co-maintainer
- Various cleanups (me, Thorsten)"
* tag 'for-6.11/io_uring-20240714' of git://git.kernel.dk/linux: (28 commits)
io_uring/net: check socket is valid in io_bind()/io_listen()
kernel: rerun task_work while freezing in get_signal()
io_uring/io-wq: limit retrying worker initialisation
io_uring/napi: Remove unnecessary s64 cast
io_uring/net: cleanup io_recv_finish() bundle handling
io_uring/msg_ring: fix overflow posting
MAINTAINERS: change Pavel Begunkov from io_uring reviewer to maintainer
io_uring/msg_ring: use kmem_cache_free() to free request
io_uring/msg_ring: check for dead submitter task
io_uring/msg_ring: add an alloc cache for io_kiocb entries
io_uring/msg_ring: improve handling of target CQE posting
io_uring: add io_add_aux_cqe() helper
io_uring: add remote task_work execution helper
io_uring/msg_ring: tighten requirement for remote posting
io_uring: Allocate only necessary memory in io_probe
io_uring: Fix probe of disabled operations
io_uring: Introduce IORING_OP_LISTEN
io_uring: Introduce IORING_OP_BIND
net: Split a __sys_listen helper for io_uring
net: Split a __sys_bind helper for io_uring
...
|
|
Use the vma_pages() helper function and fix the following
Coccinelle/coccicheck warning reported by vma_pages.cocci:
WARNING: Consider using vma_pages helper on vma
Rename the local variable vma_pages accordingly.
Signed-off-by: Thorsten Blum <[email protected]>
Link: https://lore.kernel.org/[email protected]
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
pid_list_fill_irq() runs via irq_work.
When CONFIG_PREEMPT_RT is disabled, it would run in irq_context.
so it shouldn't sleep while memory allocation.
Change gfp flags from GFP_KERNEL to GFP_NOWAIT to prevent sleep in
irq_work.
This change wouldn't impact functionality in practice because the worst-size
is 2K.
Cc: [email protected]
Fixes: 8d6e90983ade2 ("tracing: Create a sparse bitmask for pid filtering")
Link: https://lore.kernel.org/[email protected]
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: levi.yun <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
The trace_array_create_systems() function returns error pointers, not
NULL. Fix the check to match.
Cc: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance")
Link: https://lore.kernel.org/[email protected]
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Support passing NULL along AT_EMPTY_PATH for statx().
NULL paths with any flag value other than AT_EMPTY_PATH go the
usual route and end up with -EFAULT to retain compatibility (Rust
is abusing calls of the sort to detect availability of statx)
This avoids path lookup code, lockref management, memory allocation
and in case of NULL path userspace memory access (which can be
quite expensive with SMAP on x86_64)
- Don't block i_writecount during exec. Remove the
deny_write_access() mechanism for executables
- Relax open_by_handle_at() permissions in specific cases where we
can prove that the caller had sufficient privileges to open a file
- Switch timespec64 fields in struct inode to discrete integers
freeing up 4 bytes
Fixes:
- Fix false positive circular locking warning in hfsplus
- Initialize hfs_inode_info after hfs_alloc_inode() in hfs
- Avoid accidental overflows in vfs_fallocate()
- Don't interrupt fallocate with EINTR in tmpfs to avoid constantly
restarting shmem_fallocate()
- Add missing quote in comment in fs/readdir
Cleanups:
- Don't assign and test in an if statement in mqueue. Move the
assignment out of the if statement
- Reflow the logic in may_create_in_sticky()
- Remove the usage of the deprecated ida_simple_xx() API from procfs
- Reject FSCONFIG_CMD_CREATE_EXCL requets that depend on the new
mount api early
- Rename variables in copy_tree() to make it easier to understand
- Replace WARN(down_read_trylock, ...) abuse with proper asserts in
various places in the VFS
- Get rid of user_path_at_empty() and drop the empty argument from
getname_flags()
- Check for error while copying and no path in one branch in
getname_flags()
- Avoid redundant smp_mb() for THP handling in do_dentry_open()
- Rename parent_ino to d_parent_ino and make it use RCU
- Remove unused header include in fs/readdir
- Export in_group_capable() helper and switch f2fs and fuse over to
it instead of open-coding the logic in both places"
* tag 'vfs-6.11.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
ipc: mqueue: remove assignment from IS_ERR argument
vfs: rename parent_ino to d_parent_ino and make it use RCU
vfs: support statx(..., NULL, AT_EMPTY_PATH, ...)
stat: use vfs_empty_path() helper
fs: new helper vfs_empty_path()
fs: reflow may_create_in_sticky()
vfs: remove redundant smp_mb for thp handling in do_dentry_open
fuse: Use in_group_or_capable() helper
f2fs: Use in_group_or_capable() helper
fs: Export in_group_or_capable()
vfs: reorder checks in may_create_in_sticky
hfs: fix to initialize fields of hfs_inode_info after hfs_alloc_inode()
proc: Remove usage of the deprecated ida_simple_xx() API
hfsplus: fix to avoid false alarm of circular locking
Improve readability of copy_tree
vfs: shave a branch in getname_flags
vfs: retire user_path_at_empty and drop empty arg from getname_flags
vfs: stop using user_path_at_empty in do_readlinkat
tmpfs: don't interrupt fallocate with EINTR
fs: don't block i_writecount during exec
...
|
|
This reimplements commit 951bcae6c5a0 ("kallsyms: Avoid weak references
for kallsyms symbols") because I am not a big fan of PROVIDE().
As an alternative solution, this commit prepends one more kallsyms step.
KSYMS .tmp_vmlinux.kallsyms0.S # added
AS .tmp_vmlinux.kallsyms0.o # added
LD .tmp_vmlinux.btf
BTF .btf.vmlinux.bin.o
LD .tmp_vmlinux.kallsyms1
NM .tmp_vmlinux.kallsyms1.syms
KSYMS .tmp_vmlinux.kallsyms1.S
AS .tmp_vmlinux.kallsyms1.o
LD .tmp_vmlinux.kallsyms2
NM .tmp_vmlinux.kallsyms2.syms
KSYMS .tmp_vmlinux.kallsyms2.S
AS .tmp_vmlinux.kallsyms2.o
LD vmlinux
Step 0 takes /dev/null as input, and generates .tmp_vmlinux.kallsyms0.o,
which has a valid kallsyms format with the empty symbol list, and can be
linked to vmlinux. Since it is really small, the added compile-time cost
is negligible.
Signed-off-by: Masahiro Yamada <[email protected]>
Acked-by: Ard Biesheuvel <[email protected]>
Reviewed-by: Nicolas Schier <[email protected]>
|
|
Merge all the slab patches previously collected on top of v6.10-rc1,
over cleanups/fixes that had to be based on rc6.
|
|
To ensure non-reentrancy, __queue_work() attempts to enqueue a work
item to the pool of the currently executing worker. This is not only
unnecessary for an ordered workqueue, where order inherently suggests
non-reentrancy, but it could also disrupt the sequence if the item is
not enqueued on the newest PWQ.
Just queue it to the newest PWQ and let order management guarantees
non-reentrancy.
Signed-off-by: Lai Jiangshan <[email protected]>
Fixes: 4c065dbce1e8 ("workqueue: Enable unbound cpumask update on ordered workqueues")
Cc: [email protected] # v6.9+
Signed-off-by: Tejun Heo <[email protected]>
(cherry picked from commit 74347be3edfd11277799242766edf844c43dd5d3)
|
|
The type_id is defined as u32type, if(type_id<0) is invalid, hence
modified its type to s32.
./kernel/sched/ext.c:4958:5-12: WARNING: Unsigned expression compared with zero: type_id < 0.
Reported-by: Abaci Robot <[email protected]>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9523
Signed-off-by: Jiapeng Chong <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
|
|
Add a new helper to disable lockdep tracking entirely for a given class.
This is needed for bcachefs, which takes too many btree node locks for
lockdep to track. Instead, we have a single lockdep_map for "btree_trans
has any btree nodes locked", which makes more since given that we have
centralized lock management and a cycle detector.
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Boqun Feng <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Borislav Petkov:
- Fix a performance regression when measuring the CPU time of a thread
(clock_gettime(CLOCK_THREAD_CPUTIME_ID,...)) due to the addition of
PSI IRQ time accounting in the hotpath
- Fix a task_struct leak due to missing to decrement the refcount when
the task is enqueued before the timer which is supposed to do that,
expires
- Revert an attempt to expedite detaching of movable tasks, as finding
those could become very costly. Turns out the original issue wasn't
even hit by anyone
* tag 'sched_urgent_for_v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Move psi_account_irqtime() out of update_rq_clock_task() hotpath
sched/deadline: Fix task_struct reference leak
Revert "sched/fair: Make sure to try to detach at least one movable task"
|
|
https://git.linaro.org/people/daniel.lezcano/linux into timers/core
Pull clocksource/event driver updates from Daniel Lezcano:
- Remove unnecessary local variables initialization as they will be
initialized in the code path anyway right after on the ARM arch
timer and the ARM global timer (Li kunyu)
- Fix a race condition in the interrupt leading to a deadlock on the
SH CMT driver. Note that this fix was not tested on the platform
using this timer but the fix seems reasonable enough to be picked
confidently (Niklas Söderlund)
- Increase the rating of the gic-timer and use the configured width
clocksource register on the MIPS architecture (Jiaxun Yang)
- Add the DT bindings for the TMU on the Renesas platforms (Geert
Uytterhoeven)
- Add the DT bindings for the SOPHGO SG2002 clint on RiscV (Thomas
Bonnefille)
- Add the rtl-otto timer driver along with the DT bindings for the
Realtek platform (Chris Packham)
Link: https://lore.kernel.org/all/[email protected]
|
|
Describe the pool argument in the kernel-doc comment for
swiotlb_del_transient.
Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Yang Li <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-07-12
We've added 23 non-merge commits during the last 3 day(s) which contain
a total of 18 files changed, 234 insertions(+), 243 deletions(-).
The main changes are:
1) Improve BPF verifier by utilizing overflow.h helpers to check
for overflows, from Shung-Hsi Yu.
2) Fix NULL pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT
when attr->attach_prog_fd was not specified, from Tengda Wu.
3) Fix arm64 BPF JIT when generating code for BPF trampolines with
BPF_TRAMP_F_CALL_ORIG which corrupted upper address bits,
from Puranjay Mohan.
4) Remove test_run callback from lwt_seg6local_prog_ops which never worked
in the first place and caused syzbot reports,
from Sebastian Andrzej Siewior.
5) Relax BPF verifier to accept non-zero offset on KF_TRUSTED_ARGS/
/KF_RCU-typed BPF kfuncs, from Matt Bobrowski.
6) Fix a long standing bug in libbpf with regards to handling of BPF
skeleton's forward and backward compatibility, from Andrii Nakryiko.
7) Annotate btf_{seq,snprintf}_show functions with __printf,
from Alan Maguire.
8) BPF selftest improvements to reuse common network helpers in sk_lookup
test and dropping the open-coded inetaddr_len() and make_socket() ones,
from Geliang Tang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits)
selftests/bpf: Test for null-pointer-deref bugfix in resolve_prog_type()
bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT
selftests/bpf: DENYLIST.aarch64: Skip fexit_sleep again
bpf: use check_sub_overflow() to check for subtraction overflows
bpf: use check_add_overflow() to check for addition overflows
bpf: fix overflow check in adjust_jmp_off()
bpf: Eliminate remaining "make W=1" warnings in kernel/bpf/btf.o
bpf: annotate BTF show functions with __printf
bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIG
selftests/bpf: Close obj in error path in xdp_adjust_tail
selftests/bpf: Null checks for links in bpf_tcp_ca
selftests/bpf: Use connect_fd_to_fd in sk_lookup
selftests/bpf: Use start_server_addr in sk_lookup
selftests/bpf: Use start_server_str in sk_lookup
selftests/bpf: Close fd in error path in drop_on_reuseport
selftests/bpf: Add ASSERT_OK_FD macro
selftests/bpf: Add backlog for network_helper_opts
selftests/bpf: fix compilation failure when CONFIG_NF_FLOW_TABLE=m
bpf: Remove tst_run from lwt_seg6local_prog_ops.
bpf: relax zero fixed offset constraint on KF_TRUSTED_ARGS/KF_RCU
...
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Replace the deprecated[1] use of strncpy() in bacct_add_tsk(). Since this
is UAPI, include trailing padding in the copy.
Link: https://github.com/KSPP/linux/issues/90 [1]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kees Cook <[email protected]>
Cc: "Dr. Thomas Orgis" <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Ismael Luceno <[email protected]>
Cc: Peng Liu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
On powerpc 8xx, when a page is 8M size, the information is in the PMD
entry. So allow architectures to provide __pte_leaf_size() instead of
pte_leaf_size() and provide the PMD entry to that function.
When __pte_leaf_size() is not defined, define it as a pte_leaf_size() so
that architectures not interested in the PMD arguments are not impacted.
Only define a default pte_leaf_size() when __pte_leaf_size() is not
defined to make sure nobody adds new calls to pte_leaf_size() in the core.
Link: https://lkml.kernel.org/r/c7c008f0a314bf8029ad7288fdc908db1ec7e449.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Peter Xu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
In ops.dispatch(), SCX_DSQ_LOCAL_ON can be used to dispatch the task to the
local DSQ of any CPU. However, during direct dispatch from ops.select_cpu()
and ops.enqueue(), this isn't allowed. This is because dispatching to the
local DSQ of a remote CPU requires locking both the task's current and new
rq's and such double locking can't be done directly from ops.enqueue().
While waking up a task, as ops.select_cpu() can pick any CPU and both
ops.select_cpu() and ops.enqueue() can use SCX_DSQ_LOCAL as the dispatch
target to dispatch to the DSQ of the picked CPU, the BPF scheduler can still
do whatever it wants to do. However, while a task is being enqueued for a
different reason, e.g. after its slice expiration, only ops.enqueue() is
called and there's no way for the BPF scheduler to directly dispatch to the
local DSQ of a remote CPU. This gap in API forces schedulers into
work-arounds which are not straightforward or optimal such as skipping
direct dispatches in such cases.
Implement deferred enqueueing to allow directly dispatching to the local DSQ
of a remote CPU from ops.select_cpu() and ops.enqueue(). Such tasks are
temporarily queued on rq->scx.ddsp_deferred_locals. When the rq lock can be
safely released, the tasks are taken off the list and queued on the target
local DSQs using dispatch_to_local_dsq().
v2: - Add missing return after queue_balance_callback() in
schedule_deferred(). (David).
- dispatch_to_local_dsq() now assumes that @rq is locked but unpinned
and thus no longer takes @rf. Updated accordingly.
- UP build warning fix.
Signed-off-by: Tejun Heo <[email protected]>
Tested-by: Andrea Righi <[email protected]>
Acked-by: David Vernet <[email protected]>
Cc: Dan Schatzberg <[email protected]>
Cc: Changwoo Min <[email protected]>
|
|
SCX_RQ_BALANCING is used to mark that the rq is currently in balance().
Rename it to SCX_RQ_IN_BALANCE and add SCX_RQ_IN_WAKEUP which marks whether
the rq is currently enqueueing for a wakeup. This will be used to implement
direct dispatching to local DSQ of another CPU.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: David Vernet <[email protected]>
|
|
sched_ext often needs to migrate tasks across CPUs right before execution
and thus uses the balance path to dispatch tasks from the BPF scheduler.
balance_scx() is called with rq locked and pinned but is passed @rf and thus
allowed to unpin and unlock. Currently, @rf is passed down the call stack so
the rq lock is unpinned just when double locking is needed.
This creates unnecessary complications such as having to explicitly
manipulate lock pinning for core scheduling. We also want to use
dispatch_to_local_dsq_lock() from other paths which are called with rq
locked but unpinned.
rq lock handling in the dispatch path is straightforward outside the
migration implementation and extending the pinning protection down the
callstack doesn't add enough meaningful extra protection to justify the
extra complexity.
Unpin and repin rq lock from the outer balance_scx() and drop @rf passing
and lock pinning handling from the inner functions. UP is updated to call
balance_one() instead of balance_scx() to avoid adding NULL @rf handling to
balance_scx(). AS this makes balance_scx() unused in UP, it's put inside a
CONFIG_SMP block.
No functional changes intended outside of lock annotation updates.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: David Vernet <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andrea Righi <[email protected]>
|
|
task_linked_on_dsq() exists as a helper because it used to test both the
rbtree and list nodes. It now only tests the list node and the list node
will soon be used for something else too. The helper doesn't improve
anything materially and the naming will become confusing. Open-code the list
node testing and remove task_linked_on_dsq()
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: David Vernet <[email protected]>
|
|
Move struct balance_callback definition upward so that it's visible to
class-specific rq struct definitions. This will be used to embed a struct
balance_callback in struct scx_rq.
No functional changes.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: David Vernet <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
|
|
Currently the event counting provided by misc.events is hierarchical,
it's not practical if user is only concerned with events of a
specified cgroup. Therefore, introduce misc.events.local collect events
specific to the given cgroup.
This is analogous to memory.events.local and pids.events.local.
Signed-off-by: Xiu Jianfeng <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
Similar to previous patch that drops signed_add*_overflows() and uses
(compiler) builtin-based check_add_overflow(), do the same for
signed_sub*_overflows() and replace them with the generic
check_sub_overflow() to make future refactoring easier and have the
checks implemented more efficiently.
Unsigned overflow check for subtraction does not use helpers and are
simple enough already, so they're left untouched.
After the change GCC 13.3.0 generates cleaner assembly on x86_64:
if (check_sub_overflow(*dst_smin, src_reg->smax_value, dst_smin) ||
139bf: mov 0x28(%r12),%rax
139c4: mov %edx,0x54(%r12)
139c9: sub %r11,%rax
139cc: mov %rax,0x28(%r12)
139d1: jo 14627 <adjust_reg_min_max_vals+0x1237>
check_sub_overflow(*dst_smax, src_reg->smin_value, dst_smax)) {
139d7: mov 0x30(%r12),%rax
139dc: sub %r9,%rax
139df: mov %rax,0x30(%r12)
if (check_sub_overflow(*dst_smin, src_reg->smax_value, dst_smin) ||
139e4: jo 14627 <adjust_reg_min_max_vals+0x1237>
...
*dst_smin = S64_MIN;
14627: movabs $0x8000000000000000,%rax
14631: mov %rax,0x28(%r12)
*dst_smax = S64_MAX;
14636: sub $0x1,%rax
1463a: mov %rax,0x30(%r12)
Before the change it gives:
if (signed_sub_overflows(dst_reg->smin_value, smax_val) ||
13a50: mov 0x28(%r12),%rdi
13a55: mov %edx,0x54(%r12)
dst_reg->smax_value = S64_MAX;
13a5a: movabs $0x7fffffffffffffff,%rdx
13a64: mov %eax,0x50(%r12)
dst_reg->smin_value = S64_MIN;
13a69: movabs $0x8000000000000000,%rax
s64 res = (s64)((u64)a - (u64)b);
13a73: mov %rdi,%rsi
13a76: sub %rcx,%rsi
if (b < 0)
13a79: test %rcx,%rcx
13a7c: js 145ea <adjust_reg_min_max_vals+0x119a>
if (signed_sub_overflows(dst_reg->smin_value, smax_val) ||
13a82: cmp %rsi,%rdi
13a85: jl 13ac7 <adjust_reg_min_max_vals+0x677>
signed_sub_overflows(dst_reg->smax_value, smin_val)) {
13a87: mov 0x30(%r12),%r8
s64 res = (s64)((u64)a - (u64)b);
13a8c: mov %r8,%rax
13a8f: sub %r9,%rax
return res > a;
13a92: cmp %rax,%r8
13a95: setl %sil
if (b < 0)
13a99: test %r9,%r9
13a9c: js 147d1 <adjust_reg_min_max_vals+0x1381>
dst_reg->smax_value = S64_MAX;
13aa2: movabs $0x7fffffffffffffff,%rdx
dst_reg->smin_value = S64_MIN;
13aac: movabs $0x8000000000000000,%rax
if (signed_sub_overflows(dst_reg->smin_value, smax_val) ||
13ab6: test %sil,%sil
13ab9: jne 13ac7 <adjust_reg_min_max_vals+0x677>
dst_reg->smin_value -= smax_val;
13abb: mov %rdi,%rax
dst_reg->smax_value -= smin_val;
13abe: mov %r8,%rdx
dst_reg->smin_value -= smax_val;
13ac1: sub %rcx,%rax
dst_reg->smax_value -= smin_val;
13ac4: sub %r9,%rdx
13ac7: mov %rax,0x28(%r12)
...
13ad1: mov %rdx,0x30(%r12)
...
if (signed_sub_overflows(dst_reg->smin_value, smax_val) ||
145ea: cmp %rsi,%rdi
145ed: jg 13ac7 <adjust_reg_min_max_vals+0x677>
145f3: jmp 13a87 <adjust_reg_min_max_vals+0x637>
Suggested-by: Jiri Olsa <[email protected]>
Signed-off-by: Shung-Hsi Yu <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
signed_add*_overflows() was added back when there was no overflow-check
helper. With the introduction of such helpers in commit f0907827a8a91
("compiler.h: enable builtin overflow checkers and add fallback code"), we
can drop signed_add*_overflows() in kernel/bpf/verifier.c and use the
generic check_add_overflow() instead.
This will make future refactoring easier, and takes advantage of
compiler-emitted hardware instructions that efficiently implement these
checks.
After the change GCC 13.3.0 generates cleaner assembly on x86_64:
err = adjust_scalar_min_max_vals(env, insn, dst_reg, *src_reg);
13625: mov 0x28(%rbx),%r9 /* r9 = src_reg->smin_value */
13629: mov 0x30(%rbx),%rcx /* rcx = src_reg->smax_value */
...
if (check_add_overflow(*dst_smin, src_reg->smin_value, dst_smin) ||
141c1: mov %r9,%rax
141c4: add 0x28(%r12),%rax
141c9: mov %rax,0x28(%r12)
141ce: jo 146e4 <adjust_reg_min_max_vals+0x1294>
check_add_overflow(*dst_smax, src_reg->smax_value, dst_smax)) {
141d4: add 0x30(%r12),%rcx
141d9: mov %rcx,0x30(%r12)
if (check_add_overflow(*dst_smin, src_reg->smin_value, dst_smin) ||
141de: jo 146e4 <adjust_reg_min_max_vals+0x1294>
...
*dst_smin = S64_MIN;
146e4: movabs $0x8000000000000000,%rax
146ee: mov %rax,0x28(%r12)
*dst_smax = S64_MAX;
146f3: sub $0x1,%rax
146f7: mov %rax,0x30(%r12)
Before the change it gives:
s64 smin_val = src_reg->smin_value;
675: mov 0x28(%rsi),%r8
s64 smax_val = src_reg->smax_value;
u64 umin_val = src_reg->umin_value;
u64 umax_val = src_reg->umax_value;
679: mov %rdi,%rax /* rax = dst_reg */
if (signed_add_overflows(dst_reg->smin_value, smin_val) ||
67c: mov 0x28(%rdi),%rdi /* rdi = dst_reg->smin_value */
u64 umin_val = src_reg->umin_value;
680: mov 0x38(%rsi),%rdx
u64 umax_val = src_reg->umax_value;
684: mov 0x40(%rsi),%rcx
s64 res = (s64)((u64)a + (u64)b);
688: lea (%r8,%rdi,1),%r9 /* r9 = dst_reg->smin_value + src_reg->smin_value */
return res < a;
68c: cmp %r9,%rdi
68f: setg %r10b /* r10b = (dst_reg->smin_value + src_reg->smin_value) > dst_reg->smin_value */
if (b < 0)
693: test %r8,%r8
696: js 72b <scalar_min_max_add+0xbb>
signed_add_overflows(dst_reg->smax_value, smax_val)) {
dst_reg->smin_value = S64_MIN;
dst_reg->smax_value = S64_MAX;
69c: movabs $0x7fffffffffffffff,%rdi
s64 smax_val = src_reg->smax_value;
6a6: mov 0x30(%rsi),%r8
dst_reg->smin_value = S64_MIN;
6aa: 00 00 00 movabs $0x8000000000000000,%rsi
if (signed_add_overflows(dst_reg->smin_value, smin_val) ||
6b4: test %r10b,%r10b /* (dst_reg->smin_value + src_reg->smin_value) > dst_reg->smin_value ? goto 6cb */
6b7: jne 6cb <scalar_min_max_add+0x5b>
signed_add_overflows(dst_reg->smax_value, smax_val)) {
6b9: mov 0x30(%rax),%r10 /* r10 = dst_reg->smax_value */
s64 res = (s64)((u64)a + (u64)b);
6bd: lea (%r10,%r8,1),%r11 /* r11 = dst_reg->smax_value + src_reg->smax_value */
if (b < 0)
6c1: test %r8,%r8
6c4: js 71e <scalar_min_max_add+0xae>
if (signed_add_overflows(dst_reg->smin_value, smin_val) ||
6c6: cmp %r11,%r10 /* (dst_reg->smax_value + src_reg->smax_value) <= dst_reg->smax_value ? goto 723 */
6c9: jle 723 <scalar_min_max_add+0xb3>
} else {
dst_reg->smin_value += smin_val;
dst_reg->smax_value += smax_val;
}
6cb: mov %rsi,0x28(%rax)
...
6d5: mov %rdi,0x30(%rax)
...
if (signed_add_overflows(dst_reg->smin_value, smin_val) ||
71e: cmp %r11,%r10
721: jl 6cb <scalar_min_max_add+0x5b>
dst_reg->smin_value += smin_val;
723: mov %r9,%rsi
dst_reg->smax_value += smax_val;
726: mov %r11,%rdi
729: jmp 6cb <scalar_min_max_add+0x5b>
return res > a;
72b: cmp %r9,%rdi
72e: setl %r10b
732: jmp 69c <scalar_min_max_add+0x2c>
737: nopw 0x0(%rax,%rax,1)
Note: unlike adjust_ptr_min_max_vals() and scalar*_min_max_add(), it is
necessary to introduce intermediate variable in adjust_jmp_off() to keep
the functional behavior unchanged. Without an intermediate variable
imm/off will be altered even on overflow.
Suggested-by: Jiri Olsa <[email protected]>
Signed-off-by: Shung-Hsi Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
adjust_jmp_off() incorrectly used the insn->imm field for all overflow check,
which is incorrect as that should only be done or the BPF_JMP32 | BPF_JA case,
not the general jump instruction case. Fix it by using insn->off for overflow
check in the general case.
Fixes: 5337ac4c9b80 ("bpf: Fix the corner case with may_goto and jump to the 1st insn.")
Signed-off-by: Shung-Hsi Yu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
As reported by Mirsad [1] we still see format warnings in kernel/bpf/btf.o
at W=1 warning level:
CC kernel/bpf/btf.o
./kernel/bpf/btf.c: In function ‘btf_type_seq_show_flags’:
./kernel/bpf/btf.c:7553:21: warning: assignment left-hand side might be a candidate for a format attribute [-Wsuggest-attribute=format]
7553 | sseq.showfn = btf_seq_show;
| ^
./kernel/bpf/btf.c: In function ‘btf_type_snprintf_show’:
./kernel/bpf/btf.c:7604:31: warning: assignment left-hand side might be a candidate for a format attribute [-Wsuggest-attribute=format]
7604 | ssnprintf.show.showfn = btf_snprintf_show;
| ^
Combined with CONFIG_WERROR=y these can halt the build.
The fix (annotating the structure field with __printf())
suggested by Mirsad resolves these. Apologies I missed this last time.
No other W=1 warnings were observed in kernel/bpf after this fix.
[1] https://lore.kernel.org/bpf/[email protected]/
Fixes: b3470da314fd ("bpf: annotate BTF show functions with __printf")
Reported-by: Mirsad Todorovac <[email protected]>
Suggested-by: Mirsad Todorovac <[email protected]>
Signed-off-by: Alan Maguire <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
What wq_update_pod() does is just to update the pwq of the specific
cpu. Rename it and update the comments.
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
The arguments @hotplug_cpu and @online are not used in wq_update_pod()
since the functions called by wq_update_pod() don't need them.
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
wq_calc_pod_cpumask() uses wq_online_cpumask, which excludes the cpu
going down, so the argument cpu_going_down is unused and can be removed.
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
The cpumask empty check in wq_calc_pod_cpumask() has long been useless.
It just works purely as documents which states that the cpumask is not
possible empty after the function returns.
Now the code above is even more explicit that the cpumask is not empty,
so the document-only empty check can be removed.
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
1726a1713590 ("workqueue: Put PWQ allocation and WQ enlistment in the same
lock C.S.") led to the following possible deadlock:
WARNING: possible recursive locking detected
6.10.0-rc5-00004-g1d4c6111406c #1 Not tainted
--------------------------------------------
swapper/0/1 is trying to acquire lock:
c27760f4 (cpu_hotplug_lock){++++}-{0:0}, at: alloc_workqueue (kernel/workqueue.c:5152 kernel/workqueue.c:5730)
but task is already holding lock:
c27760f4 (cpu_hotplug_lock){++++}-{0:0}, at: padata_alloc (kernel/padata.c:1007)
...
stack backtrace:
...
cpus_read_lock (include/linux/percpu-rwsem.h:53 kernel/cpu.c:488)
alloc_workqueue (kernel/workqueue.c:5152 kernel/workqueue.c:5730)
padata_alloc (kernel/padata.c:1007 (discriminator 1))
pcrypt_init_padata (crypto/pcrypt.c:327 (discriminator 1))
pcrypt_init (crypto/pcrypt.c:353)
do_one_initcall (init/main.c:1267)
do_initcalls (init/main.c:1328 (discriminator 1) init/main.c:1345 (discriminator 1))
kernel_init_freeable (init/main.c:1364)
kernel_init (init/main.c:1469)
ret_from_fork (arch/x86/kernel/process.c:153)
ret_from_fork_asm (arch/x86/entry/entry_32.S:737)
entry_INT80_32 (arch/x86/entry/entry_32.S:944)
This is caused by pcrypt allocating a workqueue while holding
cpus_read_lock(), so workqueue code can't do it again as that can lead to
deadlocks if down_write starts after the first down_read.
The pwq creations and installations have been reworked based on
wq_online_cpumask rather than cpu_online_mask making cpus_read_lock() is
unneeded during wqattrs changes. Fix the deadlock by removing
cpus_read_lock() from apply_wqattrs_lock().
tj: Updated changelog.
Signed-off-by: Lai Jiangshan <[email protected]>
Fixes: 1726a1713590 ("workqueue: Put PWQ allocation and WQ enlistment in the same lock C.S.")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Tejun Heo <[email protected]>
|
|
Avoid relying on cpu_online_mask for wqattrs changes so that
cpus_read_lock() can be removed from apply_wqattrs_lock().
And with wq_online_cpumask, attrs->__pod_cpumask doesn't need to be
reused as a temporary storage to calculate if the pod have any online
CPUs @attrs wants since @cpu_going_down is not in the wq_online_cpumask.
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
The new wq_online_mask mirrors the cpu_online_mask except during
hotplugging; specifically, it differs between the hotplugging stages
of workqueue_offline_cpu() and workqueue_online_cpu(), during which
the transitioning CPU is not represented in the mask.
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
-Werror=suggest-attribute=format warns about two functions
in kernel/bpf/btf.c [1]; add __printf() annotations to silence
these warnings since for CONFIG_WERROR=y they will trigger
build failures.
[1] https://lore.kernel.org/bpf/[email protected]/
Fixes: 31d0bc81637d ("bpf: Move to generic BTF show support, apply it to seq files/strings")
Reported-by: Mirsad Todorovac <[email protected]>
Signed-off-by: Alan Maguire <[email protected]>
Tested-by: Mirsad Todorovac <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
net/sched/act_ct.c
26488172b029 ("net/sched: Fix UAF when resolving a clash")
3abbd7ed8b76 ("act_ct: prepare for stolen verdict coming from conntrack and nat engine")
No adjacent changes.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Running the LTP hotplug stress test on a aarch64 machine results in
rcu_sched stall warnings when the broadcast hrtimer was owned by the
un-plugged CPU. The issue is the following:
CPU1 (owns the broadcast hrtimer) CPU2
tick_broadcast_enter()
// shutdown local timer device
broadcast_shutdown_local()
...
tick_broadcast_exit()
clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT)
// timer device is not programmed
cpumask_set_cpu(cpu, tick_broadcast_force_mask)
initiates offlining of CPU1
take_cpu_down()
/*
* CPU1 shuts down and does not
* send broadcast IPI anymore
*/
takedown_cpu()
hotplug_cpu__broadcast_tick_pull()
// move broadcast hrtimer to this CPU
clockevents_program_event()
bc_set_next()
hrtimer_start()
/*
* timer device is not programmed
* because only the first expiring
* timer will trigger clockevent
* device reprogramming
*/
What happens is that CPU2 exits broadcast mode with force bit set, then the
local timer device is not reprogrammed and CPU2 expects to receive the
expired event by the broadcast IPI. But this does not happen because CPU1
is offlined by CPU2. CPU switches the clockevent device to ONESHOT state,
but does not reprogram the device.
The subsequent reprogramming of the hrtimer broadcast device does not
program the clockevent device of CPU2 either because the pending expiry
time is already in the past and the CPU expects the event to be delivered.
As a consequence all CPUs which wait for a broadcast event to be delivered
are stuck forever.
Fix this issue by reprogramming the local timer device if the broadcast
force bit of the CPU is set so that the broadcast hrtimer is delivered.
[ tglx: Massage comment and change log. Add Fixes tag ]
Fixes: 989dcb645ca7 ("tick: Handle broadcast wakeup of multiple cpus")
Signed-off-by: Yu Liao <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
The xen_nopvspin boot parameter is deprecated since 2019. nopvspin
can be used instead.
Remove the xen_nopvspin boot parameter and replace the xen_pvspin
variable use cases with nopvspin.
This requires to move the nopvspin variable out of the .initdata
section, as it needs to be accessed for cpuhotplug, too.
Signed-off-by: Juergen Gross <[email protected]>
Reviewed-by: Boris Ostrovsky <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
|