aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2024-07-04memcg: mm_update_next_owner: kill the "retry" logicOleg Nesterov1-30/+27
Add the new helper, try_to_set_owner(), which tries to update mm->owner once we see c->mm == mm. This way mm_update_next_owner() doesn't need to restart the list_for_each_entry/for_each_process loops from the very beginning if it races with exit/exec, it can just continue. Unlike the current code, try_to_set_owner() re-checks tsk->mm == mm before it drops tasklist_lock, so it doesn't need get/put_task_struct(). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jinliang Zheng <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Tycho Andersen <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski12-213/+49
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/phy/aquantia/aquantia.h 219343755eae ("net: phy: aquantia: add missing include guards") 61578f679378 ("net: phy: aquantia: add support for PHY LEDs") drivers/net/ethernet/wangxun/libwx/wx_hw.c bd07a9817846 ("net: txgbe: remove separate irq request for MSI and INTx") b501d261a5b3 ("net: txgbe: add FDIR ATR support") https://lore.kernel.org/all/[email protected]/ include/linux/mlx5/mlx5_ifc.h 048a403648fc ("net/mlx5: IFC updates for changing max EQs") 99be56171fa9 ("net/mlx5e: SHAMPO, Re-enable HW-GRO") https://lore.kernel.org/all/[email protected]/ Adjacent changes: drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c 4130c67cd123 ("wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference") 3f3126515fbe ("wifi: iwlwifi: mvm: add mvm-specific guard") include/net/mac80211.h 816c6bec09ed ("wifi: mac80211: fix BSS_CHANGED_UNSOL_BCAST_PROBE_RESP") 5a009b42e041 ("wifi: mac80211: track changes in AP's TPE") Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-04Merge branches 'doc.2024.06.06a', 'fixes.2024.07.04a', 'mb.2024.06.28a', ↵Paul E. McKenney15-212/+207
'nocb.2024.06.03a', 'rcu-tasks.2024.06.06a', 'rcutorture.2024.06.06a' and 'srcu.2024.06.18a' into HEAD doc.2024.06.06a: Documentation updates. fixes.2024.07.04a: Miscellaneous fixes. mb.2024.06.28a: Grace-period memory-barrier redundancy removal. nocb.2024.06.03a: No-CB CPU updates. rcu-tasks.2024.06.06a: RCU-Tasks updates. rcutorture.2024.06.06a: Torture-test updates. srcu.2024.06.18a: SRCU polled-grace-period updates.
2024-07-04rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU invocationFrederic Weisbecker1-3/+7
When rcu_barrier() calls rcu_rdp_cpu_online() and observes a CPU off rnp->qsmaskinitnext, it means that all accesses from the offline CPU preceding the CPUHP_TEARDOWN_CPU are visible to RCU barrier, including callbacks expiration and counter updates. However interrupts can still fire after stop_machine() re-enables interrupts and before rcutree_report_cpu_dead(). The related accesses happening between CPUHP_TEARDOWN_CPU and rnp->qsmaskinitnext clearing are _NOT_ guaranteed to be seen by rcu_barrier() without proper ordering, especially when callbacks are invoked there to the end, making rcutree_migrate_callback() bypass barrier_lock. The following theoretical race example can make rcu_barrier() hang: CPU 0 CPU 1 ----- ----- //cpu_down() smpboot_park_threads() //ksoftirqd is parked now <IRQ> rcu_sched_clock_irq() invoke_rcu_core() do_softirq() rcu_core() rcu_do_batch() // callback storm // rcu_do_batch() returns // before completing all // of them // do_softirq also returns early because of // timeout. It defers to ksoftirqd but // it's parked </IRQ> stop_machine() take_cpu_down() rcu_barrier() spin_lock(barrier_lock) // observes rcu_segcblist_n_cbs(&rdp->cblist) != 0 <IRQ> do_softirq() rcu_core() rcu_do_batch() //completes all pending callbacks //smp_mb() implied _after_ callback number dec </IRQ> rcutree_report_cpu_dead() rnp->qsmaskinitnext &= ~rdp->grpmask; rcutree_migrate_callback() // no callback, early return without locking // barrier_lock //observes !rcu_rdp_cpu_online(rdp) rcu_barrier_entrain() rcu_segcblist_entrain() // Observe rcu_segcblist_n_cbs(rsclp) == 0 // because no barrier between reading // rnp->qsmaskinitnext and rsclp->len rcu_segcblist_add_len() smp_mb__before_atomic() // will now observe the 0 count and empty // list, but too late, we enqueue regardless WRITE_ONCE(rsclp->len, rsclp->len + v); // ignored barrier callback // rcu barrier stall... This could be solved with a read memory barrier, enforcing the message passing between rnp->qsmaskinitnext and rsclp->len, matching the full memory barrier after rsclp->len addition in rcu_segcblist_add_len() performed at the end of rcu_do_batch(). However the rcu_barrier() is complicated enough and probably doesn't need too many more subtleties. CPU down is a slowpath and the barrier_lock seldom contended. Solve the issue with unconditionally locking the barrier_lock on rcutree_migrate_callbacks(). This makes sure that either rcu_barrier() sees the empty queue or its entrained callback will be migrated. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2024-07-04rcu: Eliminate lockless accesses to rcu_sync->gp_countOleg Nesterov1-8/+4
The rcu_sync structure's ->gp_count field is always accessed under the protection of that same structure's ->rss_lock field, with the exception of a pair of WARN_ON_ONCE() calls just prior to acquiring that lock in functions rcu_sync_exit() and rcu_sync_dtor(). These lockless accesses are unnecessary and impair KCSAN's ability to catch bugs that might be inserted via other lockless accesses. This commit therefore moves those WARN_ON_ONCE() calls under the lock. Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2024-07-04rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitterPaul E. McKenney2-2/+19
If a CPU is running either a userspace application or a guest OS in nohz_full mode, it is possible for a system call to occur just as an RCU grace period is starting. If that CPU also has the scheduling-clock tick enabled for any reason (such as a second runnable task), and if the system was booted with rcutree.use_softirq=0, then RCU can add insult to injury by awakening that CPU's rcuc kthread, resulting in yet another task and yet more OS jitter due to switching to that task, running it, and switching back. In addition, in the common case where that system call is not of excessively long duration, awakening the rcuc task is pointless. This pointlessness is due to the fact that the CPU will enter an extended quiescent state upon returning to the userspace application or guest OS. In this case, the rcuc kthread cannot do anything that the main RCU grace-period kthread cannot do on its behalf, at least if it is given a few additional milliseconds (for example, given the time duration specified by rcutree.jiffies_till_first_fqs, give or take scheduling delays). This commit therefore adds a rcutree.nohz_full_patience_delay kernel boot parameter that specifies the grace period age (in milliseconds, rounded to jiffies) before which RCU will refrain from awakening the rcuc kthread. Preliminary experimentation suggests a value of 1000, that is, one second. Increasing rcutree.nohz_full_patience_delay will increase grace-period latency and in turn increase memory footprint, so systems with constrained memory might choose a smaller value. Systems with less-aggressive OS-jitter requirements might choose the default value of zero, which keeps the traditional immediate-wakeup behavior, thus avoiding increases in grace-period latency. [ paulmck: Apply Leonardo Bras feedback. ] Link: https://lore.kernel.org/all/[email protected]/ Reported-by: Leonardo Bras <[email protected]> Suggested-by: Leonardo Bras <[email protected]> Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Leonardo Bras <[email protected]>
2024-07-04Merge branch 'tip/x86/cpu'Peter Zijlstra168-1997/+5285
The Lunarlake patches rely on the new VFM stuff. Signed-off-by: Peter Zijlstra <[email protected]>
2024-07-04perf: Make rb_alloc_aux() return an error immediately if nr_pages <= 0Adrian Hunter1-0/+3
rb_alloc_aux() should not be called with nr_pages <= 0. Make it more robust and readable by returning an error immediately in that case. Signed-off-by: Adrian Hunter <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-07-04perf: Fix default aux_watermark calculationAdrian Hunter1-1/+3
The default aux_watermark is half the AUX area buffer size. In general, on a 64-bit architecture, the AUX area buffer size could be a bigger than fits in a 32-bit type, but the calculation does not allow for that possibility. However the aux_watermark value is recorded in a u32, so should not be more than U32_MAX either. Fix by doing the calculation in a correctly sized type, and limiting the result to U32_MAX. Fixes: d68e6799a5c8 ("perf: Cap allocation order at aux_watermark") Signed-off-by: Adrian Hunter <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-07-04perf: Prevent passing zero nr_pages to rb_alloc_aux()Adrian Hunter1-0/+2
nr_pages is unsigned long but gets passed to rb_alloc_aux() as an int, and is stored as an int. Only power-of-2 values are accepted, so if nr_pages is a 64_bit value, it will be passed to rb_alloc_aux() as zero. That is not ideal because: 1. the value is incorrect 2. rb_alloc_aux() is at risk of misbehaving, although it manages to return -ENOMEM in that case, it is a result of passing zero to get_order() even though the get_order() result is documented to be undefined in that case. Fix by simply validating the maximum supported value in the first place. Use -ENOMEM error code for consistency with the current error code that is returned in that case. Fixes: 45bfb2e50471 ("perf: Add AUX area to ring buffer for raw data streams") Signed-off-by: Adrian Hunter <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-07-04perf: Fix perf_aux_size() for greater-than 32-bit sizeAdrian Hunter1-1/+1
perf_buffer->aux_nr_pages uses a 32-bit type, so a cast is needed to calculate a 64-bit size. Fixes: 45bfb2e50471 ("perf: Add AUX area to ring buffer for raw data streams") Signed-off-by: Adrian Hunter <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-07-04sched/fair: set_load_weight() must also call reweight_task() for SCHED_IDLE ↵Tejun Heo3-18/+14
tasks When a task's weight is being changed, set_load_weight() is called with @update_load set. As weight changes aren't trivial for the fair class, set_load_weight() calls fair.c::reweight_task() for fair class tasks. However, set_load_weight() first tests task_has_idle_policy() on entry and skips calling reweight_task() for SCHED_IDLE tasks. This is buggy as SCHED_IDLE tasks are just fair tasks with a very low weight and they would incorrectly skip load, vlag and position updates. Fix it by updating reweight_task() to take struct load_weight as idle weight can't be expressed with prio and making set_load_weight() call reweight_task() for SCHED_IDLE tasks too when @update_load is set. Fixes: 9059393e4ec1 ("sched/fair: Use reweight_entity() for set_user_nice()") Suggested-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] # v4.15+ Link: http://lkml.kernel.org/r/[email protected]
2024-07-04sched/psi: Optimise psi_group_change a bitTvrtko Ursulin1-27/+27
The current code loops over the psi_states only to call a helper which then resolves back to the action needed for each state using a switch statement. That is effectively creating a double indirection of a kind which, given how all the states need to be explicitly listed and handled anyway, we can simply remove. Both the for loop and the switch statement that is. The benefit is both in the code size and CPU time spent in this function. YMMV but on my Steam Deck, while in a game, the patch makes the CPU usage go from ~2.4% down to ~1.2%. Text size at the same time went from 0x323 to 0x2c1. Signed-off-by: Tvrtko Ursulin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Chengming Zhou <[email protected]> Acked-by: Johannes Weiner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2024-07-04printk: Add match_devname_and_update_preferred_console()Tony Lindgren2-15/+89
Let's add match_devname_and_update_preferred_console() for driver subsystems to call during init when the console is ready, and it's character device name is known. For now, we use it only for the serial layer to allow console=DEVNAME:0.0 style hardware based addressing for consoles. The earlier attempt on doing this caused a regression with the kernel command line console order as it added calling __add_preferred_console() again later on during init. A better approach was suggested by Petr where we add the deferred console to the console_cmdline[] and update it later on when the console is ready. Suggested-by: Petr Mladek <[email protected]> Co-developed-by: Petr Mladek <[email protected]> Signed-off-by: Tony Lindgren <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2024-07-04genirq/irq_sim: add an extended irq_sim initializerBartosz Golaszewski1-3/+57
Currently users of the interrupt simulator don't have any way of being notified about interrupts from the simulated domain being requested or released. This causes a problem for one of the users - the GPIO simulator - which is unable to lock the pins as interrupts. Define a structure containing callbacks to be executed on various irq_sim-related events (for now: irq request and release) and provide an extended function for creating simulated interrupt domains that takes it and a pointer to custom user data (to be passed to said callbacks) as arguments. Reviewed-by: Linus Walleij <[email protected]> Acked-by: Linus Walleij <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bartosz Golaszewski <[email protected]>
2024-07-03ftrace: unpoison ftrace_regs in ftrace_ops_list_func()Ilya Leoshkevich1-0/+1
Patch series "kmsan: Enable on s390", v7. Architectures use assembly code to initialize ftrace_regs and call ftrace_ops_list_func(). Therefore, from the KMSAN's point of view, ftrace_regs is poisoned on ftrace_ops_list_func entry(). This causes KMSAN warnings when running the ftrace testsuite. Fix by trusting the architecture-specific assembly code and always unpoisoning ftrace_regs in ftrace_ops_list_func. The issue was not encountered on x86_64 so far only by accident: assembly-allocated ftrace_regs was overlapping a stale partially unpoisoned stack frame. Poisoning stack frames before returns [1] makes the issue appear on x86_64 as well. [1] https://github.com/iii-i/llvm-project/commits/msan-poison-allocas-before-returning-2024-06-12/ Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Alexander Potapenko <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> Cc: Alexander Gordeev <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Cc: Marco Elver <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masami Hiramatsu (Google) <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm: extend rmap flags arguments for folio_add_new_anon_rmapBarry Song1-1/+1
Patch series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()", v2. This patchset is preparatory work for mTHP swapin. folio_add_new_anon_rmap() assumes that new anon rmaps are always exclusive. However, this assumption doesn’t hold true for cases like do_swap_page(), where a new anon might be added to the swapcache and is not necessarily exclusive. The patchset extends the rmap flags to allow folio_add_new_anon_rmap() to handle both exclusive and non-exclusive new anon folios. The do_swap_page() function is updated to use this extended API with rmap flags. Consequently, all new anon folios now consistently use folio_add_new_anon_rmap(). The special case for !folio_test_anon() in __folio_add_anon_rmap() can be safely removed. In conclusion, new anon folios always use folio_add_new_anon_rmap(), regardless of exclusivity. Old anon folios continue to use __folio_add_anon_rmap() via folio_add_anon_rmap_pmd() and folio_add_anon_rmap_ptes(). This patch (of 3): In the case of a swap-in, a new anonymous folio is not necessarily exclusive. This patch updates the rmap flags to allow a new anonymous folio to be treated as either exclusive or non-exclusive. To maintain the existing behavior, we always use EXCLUSIVE as the default setting. [[email protected]: cleanup and constifications per David and akpm] [[email protected]: fix missing doc for flags of folio_add_new_anon_rmap()] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: enhance doc for extend rmap flags arguments for folio_add_new_anon_rmap] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Barry Song <[email protected]> Suggested-by: David Hildenbrand <[email protected]> Tested-by: Shuai Yuan <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Chris Li <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yosry Ahmed <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm: optimize the redundant loop of mm_update_owner_next()Jinliang Zheng1-0/+2
When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or /proc or ptrace or page migration (get_task_mm()), it is impossible to find an appropriate task_struct in the loop whose mm_struct is the same as the target mm_struct. If the above race condition is combined with the stress-ng-zombie and stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in write_lock_irq() for tasklist_lock. Recognize this situation in advance and exit early. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jinliang Zheng <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Tycho Andersen <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03mm: remove the implementation of swap_free() and always use swap_free_nr()Barry Song1-3/+2
To streamline maintenance efforts, we propose removing the implementation of swap_free(). Instead, we can simply invoke swap_free_nr() with nr set to 1. swap_free_nr() is designed with a bitmap consisting of only one long, resulting in overhead that can be ignored for cases where nr equals 1. A prime candidate for leveraging swap_free_nr() lies within kernel/power/swap.c. Implementing this change facilitates the adoption of batch processing for hibernation. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Barry Song <[email protected]> Suggested-by: "Huang, Ying" <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Acked-by: Chris Li <[email protected]> Reviewed-by: Ryan Roberts <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Len Brown <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Andreas Larsson <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Chuanhua Han <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Gao Xiang <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kairui Song <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Yosry Ahmed <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03tick/sched: Combine WARN_ON_ONCE and print_onceAnna-Maria Behnsen1-4/+4
When the WARN_ON_ONCE() triggers, the printk() of the additional information related to the warning will not happen in print level "warn". When reading dmesg with a restriction to level "warn", the information published by the printk_once() will not show up there. Transform WARN_ON_ONCE() and printk_once() into a WARN_ONCE(). Signed-off-by: Anna-Maria Behnsen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-07-03mm: optimize the redundant loop of mm_update_owner_next()Jinliang Zheng1-0/+2
When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or /proc or ptrace or page migration (get_task_mm()), it is impossible to find an appropriate task_struct in the loop whose mm_struct is the same as the target mm_struct. If the above race condition is combined with the stress-ng-zombie and stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in write_lock_irq() for tasklist_lock. Recognize this situation in advance and exit early. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jinliang Zheng <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Tycho Andersen <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-07-03cgroup: Protect css->cgroup write under css_set_lockWaiman Long1-1/+1
The writing of css->cgroup associated with the cgroup root in rebind_subsystems() is currently protected only by cgroup_mutex. However, the reading of css->cgroup in both proc_cpuset_show() and proc_cgroup_show() is protected just by css_set_lock. That makes the readers susceptible to racing problems like data tearing or caching. It is also a problem that can be reported by KCSAN. This can be fixed by using READ_ONCE() and WRITE_ONCE() to access css->cgroup. Alternatively, the writing of css->cgroup can be moved under css_set_lock as well which is done by this patch. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2024-07-03cgroup/misc: Introduce misc.peakXiu Jianfeng1-0/+41
Introduce misc.peak to record the historical maximum usage of the resource, as in some scenarios the value of misc.max could be adjusted based on the peak usage of the resource. Signed-off-by: Xiu Jianfeng <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2024-07-03mm/slab: Plumb kmem_buckets into __do_kmalloc_node()Kees Cook1-0/+1
Introduce CONFIG_SLAB_BUCKETS which provides the infrastructure to support separated kmalloc buckets (in the following kmem_buckets_create() patches and future codetag-based separation). Since this will provide a mitigation for a very common case of exploits, it is recommended to enable this feature for general purpose distros. By default, the new Kconfig will be enabled if CONFIG_SLAB_FREELIST_HARDENED is enabled (and it is added to the hardening.config Kconfig fragment). To be able to choose which buckets to allocate from, make the buckets available to the internal kmalloc interfaces by adding them as the second argument, rather than depending on the buckets being chosen from the fixed set of global buckets. Where the bucket is not available, pass NULL, which means "use the default system kmalloc bucket set" (the prior existing behavior), as implemented in kmalloc_slab(). To avoid adding the extra argument when !CONFIG_SLAB_BUCKETS, only the top-level macros and static inlines use the buckets argument (where they are stripped out and compiled out respectively). The actual extern functions can then be built without the argument, and the internals fall back to the global kmalloc buckets unconditionally. Co-developed-by: Vlastimil Babka <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2024-07-02workqueue: Simplify goto statementLai Jiangshan1-8/+3
Use a simple if-statement to replace the cumbersome goto-statement in workqueue_set_unbound_cpumask(). Cc: Waiman Long <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2024-07-02workqueue: Update cpumasks after only applying it successfullyLai Jiangshan1-4/+6
Make workqueue_unbound_exclude_cpumask() and workqueue_set_unbound_cpumask() only update wq_isolated_cpumask and wq_requested_unbound_cpumask when workqueue_apply_unbound_cpumask() returns successfully. Fixes: fe28f631fa94("workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs from wq_unbound_cpumask") Cc: Waiman Long <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2024-07-02bpf, devmap: Add .map_alloc_checkFlorian Lehner1-10/+17
Use the .map_allock_check callback to perform allocation checks before allocating memory for the devmap. Signed-off-by: Florian Lehner <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2024-07-02bpf: Fix atomic probe zero-extensionIlya Leoshkevich1-1/+2
Zero-extending results of atomic probe operations fails with: verifier bug. zext_dst is set, but no reg is defined The problem is that insn_def_regno() handles BPF_ATOMICs, but not BPF_PROBE_ATOMICs. Fix by adding the missing condition. Fixes: d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") Signed-off-by: Ilya Leoshkevich <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2024-07-02livepatch: Replace snprintf() with sysfs_emit()Yafang Shao1-3/+2
Let's use sysfs_emit() instead of snprintf(). Suggested-by: Miroslav Benes <[email protected]> Signed-off-by: Yafang Shao <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Acked-by: Miroslav Benes <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]>
2024-07-02livepatch: Add "replace" sysfs attributeYafang Shao1-0/+12
There are situations when it might make sense to combine livepatches with and without the atomic replace on the same system. For example, the livepatch without the atomic replace might provide a hotfix or extra tuning. Managing livepatches on such systems might be challenging. And the information which of the installed livepatches do not use the atomic replace would be useful. Add new sysfs interface 'replace'. It works as follows: $ cat /sys/kernel/livepatch/livepatch-non_replace/replace 0 $ cat /sys/kernel/livepatch/livepatch-replace/replace 1 [ commit log improved by Petr ] Signed-off-by: Yafang Shao <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Acked-by: Miroslav Benes <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]>
2024-07-02net: Move flush list retrieval to where it is used.Sebastian Andrzej Siewior2-3/+6
The bpf_net_ctx_get_.*_flush_list() are used at the top of the function. This means the variable is always assigned even if unused. By moving the function to where it is used, it is possible to delay the initialisation until it is unavoidable. Not sure how much this gains in reality but by looking at bq_enqueue() (in devmap.c) gcc pushes one register less to the stack. \o/. Move flush list retrieval to where it is used. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-07-02net: Optimize xdp_do_flush() with bpf_net_context infos.Sebastian Andrzej Siewior2-24/+2
Every NIC driver utilizing XDP should invoke xdp_do_flush() after processing all packages. With the introduction of the bpf_net_context logic the flush lists (for dev, CPU-map and xsk) are lazy initialized only if used. However xdp_do_flush() tries to flush all three of them so all three lists are always initialized and the likely empty lists are "iterated". Without the usage of XDP but with CONFIG_DEBUG_NET the lists are also initialized due to xdp_do_check_flushed(). Jakub suggest to utilize the hints in bpf_net_context and avoid invoking the flush function. This will also avoiding initializing the lists which are otherwise unused. Introduce bpf_net_ctx_get_all_used_flush_lists() to return the individual list if not-empty. Use the logic in xdp_do_flush() and xdp_do_check_flushed(). Remove the not needed .*_check_flush(). Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-07-02net: Remove task_struct::bpf_net_context init on fork.Sebastian Andrzej Siewior1-1/+0
There is no clone() invocation within a bpf_net_ctx_…() block. Therefore the task_struct::bpf_net_context has always to be NULL and an explicit initialisation is not required. Remove the NULL assignment in the clone() path. Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-07-01fgraph: Use str_plural() in test_graph_storage_single()Jiapeng Chong1-1/+1
Use existing str_plural() function rather than duplicating its implementation. ./kernel/trace/trace_selftest.c:880:56-60: opportunity for str_plural(size). Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Reported-by: Abaci Robot <[email protected]> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=9349 Signed-off-by: Jiapeng Chong <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-07-01rtla/osnoise: set the default threshold to 1usLuis Claudio R. Goncalves1-2/+2
Change the default threshold for osnoise to 1us, so that any noise equal or above this value is recorded. Let the user set a higher threshold if necessary. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Jonathan Corbet <[email protected]> Suggested-by: Daniel Bristot de Oliveira <[email protected]> Reviewed-by: Clark Williams <[email protected]> Signed-off-by: Luis Claudio R. Goncalves <[email protected]> Acked-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-07-01sched_ext: Swap argument positions in kcalloc() call to avoid compiler warningTejun Heo1-1/+1
alloc_exit_info() calls kcalloc() but puts in the size of the element as the first argument which triggers the following gcc warning: kernel/sched/ext.c:3815:32: warning: ‘kmalloc_array_noprof’ sizes specified with ‘sizeof’ in the earlier argument and not in the later argument [-Wcalloc-transposed-args] Fix it by swapping the positions of the first two arguments. No functional changes. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Vishal Chourasia <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2024-07-01bpf: Use precise image size for struct_ops trampolinePu Lehui1-1/+1
For trampoline using bpf_prog_pack, we need to generate a rw_image buffer with size of (image_end - image). For regular trampoline, we use the precise image size generated by arch_bpf_trampoline_size to allocate rw_image. But for struct_ops trampoline, we allocate rw_image directly using close to PAGE_SIZE size. We do not need to allocate for that much, as the patch size is usually much smaller than PAGE_SIZE. Let's use precise image size for it too. Signed-off-by: Pu Lehui <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: Björn Töpel <[email protected]> #riscv Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2024-07-01sched: Move psi_account_irqtime() out of update_rq_clock_task() hotpathJohn Stultz4-10/+30
It was reported that in moving to 6.1, a larger then 10% regression was seen in the performance of clock_gettime(CLOCK_THREAD_CPUTIME_ID,...). Using a simple reproducer, I found: 5.10: 100000000 calls in 24345994193 ns => 243.460 ns per call 100000000 calls in 24288172050 ns => 242.882 ns per call 100000000 calls in 24289135225 ns => 242.891 ns per call 6.1: 100000000 calls in 28248646742 ns => 282.486 ns per call 100000000 calls in 28227055067 ns => 282.271 ns per call 100000000 calls in 28177471287 ns => 281.775 ns per call The cause of this was finally narrowed down to the addition of psi_account_irqtime() in update_rq_clock_task(), in commit 52b1364ba0b1 ("sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure"). In my initial attempt to resolve this, I leaned towards moving all accounting work out of the clock_gettime() call path, but it wasn't very pretty, so it will have to wait for a later deeper rework. Instead, Peter shared this approach: Rework psi_account_irqtime() to use its own psi_irq_time base for accounting, and move it out of the hotpath, calling it instead from sched_tick() and __schedule(). In testing this, we found the importance of ensuring psi_account_irqtime() is run under the rq_lock, which Johannes Weiner helpfully explained, so also add some lockdep annotations to make that requirement clear. With this change the performance is back in-line with 5.10: 6.1+fix: 100000000 calls in 24297324597 ns => 242.973 ns per call 100000000 calls in 24318869234 ns => 243.189 ns per call 100000000 calls in 24291564588 ns => 242.916 ns per call Reported-by: Jimmy Shiu <[email protected]> Originally-by: Peter Zijlstra <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Chengming Zhou <[email protected]> Reviewed-by: Qais Yousef <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-07-01sched/deadline: Fix task_struct reference leakWander Lairson Costa1-1/+6
During the execution of the following stress test with linux-rt: stress-ng --cyclic 30 --timeout 30 --minimize --quiet kmemleak frequently reported a memory leak concerning the task_struct: unreferenced object 0xffff8881305b8000 (size 16136): comm "stress-ng", pid 614, jiffies 4294883961 (age 286.412s) object hex dump (first 32 bytes): 02 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 .@.............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ debug hex dump (first 16 bytes): 53 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 S............... backtrace: [<00000000046b6790>] dup_task_struct+0x30/0x540 [<00000000c5ca0f0b>] copy_process+0x3d9/0x50e0 [<00000000ced59777>] kernel_clone+0xb0/0x770 [<00000000a50befdc>] __do_sys_clone+0xb6/0xf0 [<000000001dbf2008>] do_syscall_64+0x5d/0xf0 [<00000000552900ff>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 The issue occurs in start_dl_timer(), which increments the task_struct reference count and sets a timer. The timer callback, dl_task_timer, is supposed to decrement the reference count upon expiration. However, if enqueue_task_dl() is called before the timer expires and cancels it, the reference count is not decremented, leading to the leak. This patch fixes the reference leak by ensuring the task_struct reference count is properly decremented when the timer is canceled. Fixes: feff2e65efd8 ("sched/deadline: Unthrottle PI boosted threads while enqueuing") Signed-off-by: Wander Lairson Costa <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Juri Lelli <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-07-01Revert "sched/fair: Make sure to try to detach at least one movable task"Josh Don1-9/+3
This reverts commit b0defa7ae03ecf91b8bfd10ede430cff12fcbd06. b0defa7ae03ec changed the load balancing logic to ignore env.max_loop if all tasks examined to that point were pinned. The goal of the patch was to make it more likely to be able to detach a task buried in a long list of pinned tasks. However, this has the unfortunate side effect of creating an O(n) iteration in detach_tasks(), as we now must fully iterate every task on a cpu if all or most are pinned. Since this load balance code is done with rq lock held, and often in softirq context, it is very easy to trigger hard lockups. We observed such hard lockups with a user who affined O(10k) threads to a single cpu. When I discussed this with Vincent he initially suggested that we keep the limit on the number of tasks to detach, but increase the number of tasks we can search. However, after some back and forth on the mailing list, he recommended we instead revert the original patch, as it seems likely no one was actually getting hit by the original issue. Fixes: b0defa7ae03e ("sched/fair: Make sure to try to detach at least one movable task") Signed-off-by: Josh Don <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Vincent Guittot <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-06-30Merge tag 'tty-6.10-rc6' of ↵Linus Torvalds4-172/+5
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial / console fixes from Greg KH: "Here are a bunch of fixes/reverts for 6.10-rc6. Include in here are: - revert the bunch of tty/serial/console changes that landed in -rc1 that didn't quite work properly yet. Everyone agreed to just revert them for now and will work on making them better for a future release instead of trying to quick fix the existing changes this late in the release cycle - 8250 driver port count bugfix - Other tiny serial port bugfixes for reported issues All of these have been in linux-next this week with no reported issues" * tag 'tty-6.10-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: Revert "printk: Save console options for add_preferred_console_match()" Revert "printk: Don't try to parse DEVNAME:0.0 console options" Revert "printk: Flag register_console() if console is set on command line" Revert "serial: core: Add support for DEVNAME:0.0 style naming for kernel console" Revert "serial: core: Handle serial console options" Revert "serial: 8250: Add preferred console in serial8250_isa_init_ports()" Revert "Documentation: kernel-parameters: Add DEVNAME:0.0 format for serial ports" Revert "serial: 8250: Fix add preferred console for serial8250_isa_init_ports()" Revert "serial: core: Fix ifdef for serial base console functions" serial: bcm63xx-uart: fix tx after conversion to uart_port_tx_limited() serial: core: introduce uart_port_tx_limited_flags() Revert "serial: core: only stop transmit when HW fifo is empty" serial: imx: set receiver level before starting uart tty: mcf: MCF54418 has 10 UARTS serial: 8250_omap: Implementation of Errata i2310 tty: serial: 8250: Fix port count mismatch with the device
2024-06-30Merge tag 'smp_urgent_for_v6.10_rc6' of ↵Linus Torvalds1-4/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp fixes from Borislav Petkov: - Fix "nosmp" and "maxcpus=0" after the parallel CPU bringup work went in and broke them - Make sure CPU hotplug dynamic prepare states are actually executed * tag 'smp_urgent_for_v6.10_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu: Fix broken cmdline "nosmp" and "maxcpus=0" cpu/hotplug: Fix dynstate assignment in __cpuhp_setup_state_cpuslocked()
2024-06-30Merge tag 'timers_urgent_for_v6.10_rc6' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Borislav Petkov: - Warn when an hrtimer doesn't get a callback supplied * tag 'timers_urgent_for_v6.10_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimer: Prevent queuing of hrtimer without a function callback
2024-06-28resource: add missing MODULE_DESCRIPTION()Jeff Johnson1-0/+1
Fix the 'make W=1' warning: WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/resource_kunit.o Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Jeff Johnson <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-06-28cpumask: Add enabled cpumask for present CPUs that can be brought onlineJames Morse1-0/+3
The 'offline' file in sysfs shows all offline CPUs, including those that aren't present. User-space is expected to remove not-present CPUs from this list to learn which CPUs could be brought online. CPUs can be present but not-enabled. These CPUs can't be brought online until the firmware policy changes, which comes with an ACPI notification that will register the CPUs. With only the offline and present files, user-space is unable to determine which CPUs it can try to bring online. Add a new CPU mask that shows this based on all the registered CPUs. Signed-off-by: James Morse <[email protected]> Tested-by: Miguel Luis <[email protected]> Tested-by: Vishnu Pajjuri <[email protected]> Tested-by: Jianyong Wu <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]> Reviewed-by: Gavin Shan <[email protected]> Signed-off-by: Jonathan Cameron <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2024-06-28cgroup/cpuset: Prevent UAF in proc_cpuset_show()Chen Ridong1-4/+9
An UAF can happen when /proc/cpuset is read as reported in [1]. This can be reproduced by the following methods: 1.add an mdelay(1000) before acquiring the cgroup_lock In the cgroup_path_ns function. 2.$cat /proc/<pid>/cpuset repeatly. 3.$mount -t cgroup -o cpuset cpuset /sys/fs/cgroup/cpuset/ $umount /sys/fs/cgroup/cpuset/ repeatly. The race that cause this bug can be shown as below: (umount) | (cat /proc/<pid>/cpuset) css_release | proc_cpuset_show css_release_work_fn | css = task_get_css(tsk, cpuset_cgrp_id); css_free_rwork_fn | cgroup_path_ns(css->cgroup, ...); cgroup_destroy_root | mutex_lock(&cgroup_mutex); rebind_subsystems | cgroup_free_root | | // cgrp was freed, UAF | cgroup_path_ns_locked(cgrp,..); When the cpuset is initialized, the root node top_cpuset.css.cgrp will point to &cgrp_dfl_root.cgrp. In cgroup v1, the mount operation will allocate cgroup_root, and top_cpuset.css.cgrp will point to the allocated &cgroup_root.cgrp. When the umount operation is executed, top_cpuset.css.cgrp will be rebound to &cgrp_dfl_root.cgrp. The problem is that when rebinding to cgrp_dfl_root, there are cases where the cgroup_root allocated by setting up the root for cgroup v1 is cached. This could lead to a Use-After-Free (UAF) if it is subsequently freed. The descendant cgroups of cgroup v1 can only be freed after the css is released. However, the css of the root will never be released, yet the cgroup_root should be freed when it is unmounted. This means that obtaining a reference to the css of the root does not guarantee that css.cgrp->root will not be freed. Fix this problem by using rcu_read_lock in proc_cpuset_show(). As cgroup_root is kfree_rcu after commit d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe"), css->cgroup won't be freed during the critical section. To call cgroup_path_ns_locked, css_set_lock is needed, so it is safe to replace task_get_css with task_css. [1] https://syzkaller.appspot.com/bug?extid=9b1ff7be974a403aa4cd Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces") Signed-off-by: Chen Ridong <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2024-06-28seccomp: release task filters when the task exitsAndrei Vagin2-6/+20
Previously, seccomp filters were released in release_task(), which required the process to exit and its zombie to be collected. However, exited threads/processes can't trigger any seccomp events, making it more logical to release filters upon task exits. This adjustment simplifies scenarios where a parent is tracing its child process. The parent process can now handle all events from a seccomp listening descriptor and then call wait to collect a child zombie. seccomp_filter_release takes the siglock to avoid races with seccomp_sync_threads. There was an idea to bypass taking the lock by checking PF_EXITING, but it can be set without holding siglock if threads have SIGNAL_GROUP_EXIT. This means it can happen concurently with seccomp_filter_release. This change also fixes another minor problem. Suppose that a group leader installs the new filter without SECCOMP_FILTER_FLAG_TSYNC, exits, and becomes a zombie. Without this change, SECCOMP_FILTER_FLAG_TSYNC from any other thread can never succeed, seccomp_can_sync_threads() will check a zombie leader and is_ancestor() will fail. Reviewed-by: Oleg Nesterov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Tycho Andersen <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2024-06-28seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV when all users have exitedAndrei Vagin1-1/+6
SECCOMP_IOCTL_NOTIF_RECV promptly returns when a seccomp filter becomes unused, as a filter without users can't trigger any events. Previously, event listeners had to rely on epoll to detect when all processes had exited. The change is based on the 'commit 99cdb8b9a573 ("seccomp: notify about unused filter")' which implemented (E)POLLHUP notifications. Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Tycho Andersen <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2024-06-28rcu/exp: Remove redundant full memory barrier at the end of GPFrederic Weisbecker1-2/+6
A full memory barrier is necessary at the end of the expedited grace period to order: 1) The grace period completion (pictured by the GP sequence number) with all preceding accesses. This pairs with rcu_seq_end() performed by the concurrent kworker. 2) The grace period completion and subsequent post-GP update side accesses. Pairs again against rcu_seq_end(). This full barrier is already provided by the final sync_exp_work_done() test, making the subsequent explicit one redundant. Remove it and improve comments. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Boqun Feng <[email protected]> Reviewed-by: Neeraj Upadhyay <[email protected]>
2024-06-28rcu: Remove full memory barrier on RCU stall printoutFrederic Weisbecker2-12/+2
RCU stall printout fetches the EQS state of a CPU with a preceding full memory barrier. However there is nothing to order this read against at this debugging stage. It is inherently racy when performed remotely. Do a plain read instead. This was the last user of rcu_dynticks_snap(). Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Boqun Feng <[email protected]> Reviewed-by: Neeraj Upadhyay <[email protected]>