aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2021-12-09rcu-tasks: Use fewer callbacks queues if callback flood endsPaul E. McKenney1-2/+46
By default, when lock contention is encountered, the RCU Tasks flavors of RCU switch to using per-CPU queueing. However, if the callback flood ends, per-CPU queueing continues to be used, which introduces significant additional overhead, especially for callback invocation, which fans out a series of workqueue handlers. This commit therefore switches back to single-queue operation if at the beginning of a grace period there are very few callbacks. The definition of "very few" is set by the rcupdate.rcu_task_collapse_lim module parameter, which defaults to 10. This switch happens in two phases, with the first phase causing future callbacks to be enqueued on CPU 0's queue, but with all queues continuing to be checked for grace periods and callback invocation. The second phase checks to see if an RCU grace period has elapsed and if all remaining RCU-Tasks callbacks are queued on CPU 0. If so, only CPU 0 is checked for future grace periods and callback operation. Of course, the return of contention anywhere during this process will result in returning to per-CPU callback queueing. Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-09rcu-tasks: Use separate ->percpu_dequeue_lim for callback dequeueingPaul E. McKenney1-5/+10
Decreasing the number of callback queues is a bit tricky because it is necessary to handle callbacks that were queued before the number of queues decreased, but which were not ready to invoke until afterwards. This commit takes a first step in this direction by maintaining a separate ->percpu_dequeue_lim to control callback dequeueing, in addition to the existing ->percpu_enqueue_lim which now controls only enqueueing. Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-09rcu-tasks: Use more callback queues if contention encounteredPaul E. McKenney1-4/+23
The rcupdate.rcu_task_enqueue_lim module parameter allows system administrators to tune the number of callback queues used by the RCU Tasks flavors. However if callback storms are infrequent, it would be better to operate with a single queue on a given system unless and until that system actually needed more queues. Systems not needing more queues can then avoid the overhead of checking the extra queues and especially avoid the overhead of fanning workqueue handlers out to all CPUs to invoke callbacks. This commit therefore switches to using all the CPUs' callback queues if call_rcu_tasks_generic() encounters too much lock contention. The amount of lock contention to tolerate defaults to 100 contended lock acquisitions per jiffy, and can be adjusted using the new rcupdate.rcu_task_contend_lim module parameter. Such switching is undertaken only if the rcupdate.rcu_task_enqueue_lim module parameter is negative, which is its default value (-1). This allows savvy systems administrators to set the number of queues to some known good value and to not have to worry about the kernel doing any second guessing. [ paulmck: Apply feedback from Guillaume Tucker and kernelci. ] Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-09rcu-tasks: Avoid raw-spinlocked wakeups from call_rcu_tasks_generic()Paul E. McKenney1-1/+16
If the caller of of call_rcu_tasks(), call_rcu_tasks_rude(), or call_rcu_tasks_trace() holds a raw spinlock, and then if call_rcu_tasks_generic() determines that the grace-period kthread must be awakened, then the wakeup might acquire a normal spinlock while a raw spinlock is held. This results in lockdep splats when the kernel is built with CONFIG_PROVE_RAW_LOCK_NESTING=y. This commit therefore defers the wakeup using irq_work_queue(). It would be nice to directly invoke wakeup when a raw spinlock is not held, but there is currently no way to check for this in all kernels. Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-09rcu-tasks: Count trylocks to estimate call_rcu_tasks() contentionPaul E. McKenney1-1/+14
This commit converts the unconditional raw_spin_lock_rcu_node() lock acquisition in call_rcu_tasks_generic() to a trylock followed by an unconditional acquisition if the trylock fails. If the trylock fails, the failure is counted, but the count is reset to zero on each new jiffy. This statistic will be used to determine when to move from a single callback queue to per-CPU callback queues. Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-09rcu-tasks: Add rcupdate.rcu_task_enqueue_lim to set initial queueingPaul E. McKenney1-6/+18
This commit adds a rcupdate.rcu_task_enqueue_lim module parameter that sets the initial number of callback queues to use for the RCU Tasks family of RCU implementations. This parameter allows testing of various fanout values. Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-09rcu-tasks: Make rcu_barrier_tasks*() handle multiple callback queuesPaul E. McKenney1-6/+64
Currently, rcu_barrier_tasks(), rcu_barrier_tasks_rude(), and rcu_barrier_tasks_trace() simply invoke the corresponding synchronize_rcu_tasks*() function. This works because there is only one callback queue. However, there will soon be multiple callback queues. This commit therefore scans the queues currently in use, entraining a callback on each non-empty queue. Sequence numbers and reference counts are used to synchronize this process in a manner similar to the approach taken by rcu_barrier(). Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-09rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocationsPaul E. McKenney1-23/+52
If there is a flood of callbacks, it is necessary to put multiple CPUs to work invoking those callbacks. This commit therefore uses a workqueue-flooding approach to parallelize RCU Tasks callback execution. Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-09rcu-tasks: Abstract invocations of callbacksPaul E. McKenney1-22/+34
This commit adds a rcu_tasks_invoke_cbs() function that invokes all ready callbacks on all of the per-CPU lists that are currently in use. Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-09rcu-tasks: Abstract checking of callback listsPaul E. McKenney1-23/+38
This commit adds a rcu_tasks_need_gpcb() function that returns an indication of whether another grace period is required, and if no grace period is required, whether there are callbacks that need to be invoked. The function scans all per-CPU lists currently in use. Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-09wait: add wake_up_pollfree()Eric Biggers1-0/+7
Several ->poll() implementations are special in that they use a waitqueue whose lifetime is the current task, rather than the struct file as is normally the case. This is okay for blocking polls, since a blocking poll occurs within one task; however, non-blocking polls require another solution. This solution is for the queue to be cleared before it is freed, using 'wake_up_poll(wq, EPOLLHUP | POLLFREE);'. However, that has a bug: wake_up_poll() calls __wake_up() with nr_exclusive=1. Therefore, if there are multiple "exclusive" waiters, and the wakeup function for the first one returns a positive value, only that one will be called. That's *not* what's needed for POLLFREE; POLLFREE is special in that it really needs to wake up everyone. Considering the three non-blocking poll systems: - io_uring poll doesn't handle POLLFREE at all, so it is broken anyway. - aio poll is unaffected, since it doesn't support exclusive waits. However, that's fragile, as someone could add this feature later. - epoll doesn't appear to be broken by this, since its wakeup function returns 0 when it sees POLLFREE. But this is fragile. Although there is a workaround (see epoll), it's better to define a function which always sends POLLFREE to all waiters. Add such a function. Also make it verify that the queue really becomes empty after all waiters have been woken up. Reported-by: Linus Torvalds <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
2021-12-09rcu-tasks: Add a ->percpu_enqueue_lim to the rcu_tasks structurePaul E. McKenney1-0/+4
This commit adds a ->percpu_enqueue_lim field to the rcu_tasks structure. This field contains two to the power of the ->percpu_enqueue_shift field, easing construction of iterators over the per-CPU queues that might contain RCU Tasks callbacks. Such iterators will be introduced in later commits. Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-09rcu-tasks: Inspect stalled task's trc state in locked stateNeeraj Upadhyay1-9/+34
On RCU tasks trace stall, inspect the RCU-tasks-trace specific states of stalled task in locked down state, using try_invoke_ on_locked_down_task(), to get reliable trc state of a non-running stalled task. This was tested using the following command: tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 8 --configs TRACE01 \ --bootargs "rcutorture.torture_type=tasks-tracing rcutorture.stall_cpu=10 \ rcutorture.stall_cpu_block=1 rcupdate.rcu_task_stall_timeout=100" --trust-make As expected, this produced the following console output for running and sleeping tasks. [ 21.520291] INFO: rcu_tasks_trace detected stalls on tasks: [ 21.521292] P85: ... nesting: 1N cpu: 2 [ 21.521966] task:rcu_torture_sta state:D stack:15080 pid: 85 ppid: 2 flags:0x00004000 [ 21.523384] Call Trace: [ 21.523808] __schedule+0x273/0x6e0 [ 21.524428] schedule+0x35/0xa0 [ 21.524971] schedule_timeout+0x1ed/0x270 [ 21.525690] ? del_timer_sync+0x30/0x30 [ 21.526371] ? rcu_torture_writer+0x720/0x720 [ 21.527106] rcu_torture_stall+0x24a/0x270 [ 21.527816] kthread+0x115/0x140 [ 21.528401] ? set_kthread_struct+0x40/0x40 [ 21.529136] ret_from_fork+0x22/0x30 [ 21.529766] 1 holdouts [ 21.632300] INFO: rcu_tasks_trace detected stalls on tasks: [ 21.632345] rcu_torture_stall end. [ 21.633293] P85: . [ 21.633294] task:rcu_torture_sta state:R running task stack:15080 pid: 85 ppid: 2 flags:0x00004000 [ 21.633299] Call Trace: [ 21.633301] ? vprintk_emit+0xab/0x180 [ 21.633306] ? vprintk_emit+0x11a/0x180 [ 21.633308] ? _printk+0x4d/0x69 [ 21.633311] ? __default_send_IPI_shortcut+0x1f/0x40 [ paulmck: Update to new v5.16 task_call_func() name. ] Signed-off-by: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-09rcu-tasks: Use spin_lock_rcu_node() and friendsPaul E. McKenney1-22/+20
This commit renames the rcu_tasks_percpu structure's ->cbs_pcpu_lock to ->lock and then uses spin_lock_rcu_node() and friends to acquire and release this lock, preparing for upcoming commits that will spread the grace-period process across multiple CPUs and kthreads. [ paulmck: Apply feedback from kernel test robot. ] Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-09tracing: Fix possible memory leak in __create_synth_event() error pathMiaoqian Lin1-5/+6
There's error paths in __create_synth_event() after the argv is allocated that fail to free it. Add a jump to free it when necessary. Link: https://lkml.kernel.org/r/[email protected] Suggested-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Miaoqian Lin <[email protected]> [ Fixed up the patch and change log ] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-12-09genirq/msi: Handle PCI/MSI allocation fail in core codeThomas Gleixner1-4/+25
Get rid of yet another irqdomain callback and let the core code return the already available information of how many descriptors could be allocated. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> # PCI Link: https://lore.kernel.org/r/[email protected]
2021-12-09genirq/msi, treewide: Use a named struct for PCI/MSI attributesThomas Gleixner1-2/+2
The unnamed struct sucks and is in the way of further cleanups. Stick the PCI related MSI data into a real data structure and cleanup all users. No functional change. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Acked-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-12-09genirq/msi: Fixup includesThomas Gleixner1-0/+1
Remove the kobject.h include from msi.h as it's not required and add a sysfs.h include to the core code instead. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-12-09genirq/msi: Remove unused domain callbacksThomas Gleixner1-5/+0
No users and there is no need to grow them. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected]
2021-12-09genirq/msi: Guard sysfs codeThomas Gleixner1-0/+2
No point in building unused code when CONFIG_SYSFS=n. Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-12-08bpf: Remove redundant assignment to pointer tColin Ian King1-1/+1
The pointer t is being initialized with a value that is never read. The pointer is re-assigned a value a littler later on, hence the initialization is redundant and can be removed. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-12-08Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski2-10/+3
Daniel Borkmann says: ==================== bpf 2021-12-08 We've added 12 non-merge commits during the last 22 day(s) which contain a total of 29 files changed, 659 insertions(+), 80 deletions(-). The main changes are: 1) Fix an off-by-two error in packet range markings and also add a batch of new tests for coverage of these corner cases, from Maxim Mikityanskiy. 2) Fix a compilation issue on MIPS JIT for R10000 CPUs, from Johan Almbladh. 3) Fix two functional regressions and a build warning related to BTF kfunc for modules, from Kumar Kartikeya Dwivedi. 4) Fix outdated code and docs regarding BPF's migrate_disable() use on non- PREEMPT_RT kernels, from Sebastian Andrzej Siewior. 5) Add missing includes in order to be able to detangle cgroup vs bpf header dependencies, from Jakub Kicinski. 6) Fix regression in BPF sockmap tests caused by missing detachment of progs from sockets when they are removed from the map, from John Fastabend. 7) Fix a missing "no previous prototype" warning in x86 JIT caused by BPF dispatcher, from Björn Töpel. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Add selftests to cover packet access corner cases bpf: Fix the off-by-two error in range markings treewide: Add missing includes masked by cgroup -> bpf dependency tools/resolve_btfids: Skip unresolved symbol warning for empty BTF sets bpf: Fix bpf_check_mod_kfunc_call for built-in modules bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL mips, bpf: Fix reference to non-existing Kconfig symbol bpf: Make sure bpf_disable_instrumentation() is safe vs preemption. Documentation/locking/locktypes: Update migrate_disable() bits. bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap bpf, sockmap: Attach map progs to psock early for feature probes bpf, x86: Fix "no previous prototype" warning ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-12-08sched/fair: Cleanup task_util and capacity typeVincent Donnefort1-2/+3
task_util and capacity are comparable unsigned long values. There is no need for an intermidiate implicit signed cast. Signed-off-by: Vincent Donnefort <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2021-12-08ftrace: Add cleanup to unregister_ftrace_direct_multiJiri Olsa1-0/+4
Adding ops cleanup to unregister_ftrace_direct_multi, so it can be reused in another register call. Link: https://lkml.kernel.org/r/[email protected] Cc: Ingo Molnar <[email protected]> Cc: Heiko Carstens <[email protected]> Fixes: f64dd4627ec6 ("ftrace: Add multi direct register/unregister interface") Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-12-08ftrace: Use direct_ops hash in unregister_ftrace_directJiri Olsa1-1/+3
Now when we have *direct_multi interface the direct_functions hash is no longer owned just by direct_ops. It's also used by any other ftrace_ops passed to *direct_multi interface. Thus to find out that we are unregistering the last function from direct_ops, we need to check directly direct_ops's hash. Link: https://lkml.kernel.org/r/[email protected] Cc: Ingo Molnar <[email protected]> Cc: Heiko Carstens <[email protected]> Fixes: f64dd4627ec6 ("ftrace: Add multi direct register/unregister interface") Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-12-08dma-direct: add a dma_direct_use_pool helperChristoph Hellwig1-6/+12
Add a helper to check if a potentially blocking operation should dip into the atomic pools. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Robin Murphy <[email protected]>
2021-12-08PM: hibernate: Allow ACPI hardware signature to be honouredDavid Woodhouse2-2/+15
Theoretically, when the hardware signature in FACS changes, the OS is supposed to gracefully decline to attempt to resume from S4: "If the signature has changed, OSPM will not restore the system context and can boot from scratch" In practice, Windows doesn't do this and many laptop vendors do allow the signature to change especially when docking/undocking, so it would be a bad idea to simply comply with the specification by default in the general case. However, there are use cases where we do want the compliant behaviour and we know it's safe. Specifically, when resuming virtual machines where we know the hypervisor has changed sufficiently that resume will fail. We really want to be able to *tell* the guest kernel not to try, so it boots cleanly and doesn't just crash. This patch provides a way to opt in to the spec-compliant behaviour on the command line. A follow-up patch may do this automatically for certain "known good" machines based on a DMI match, or perhaps just for all hypervisor guests since there's no good reason a hypervisor would change the hardware_signature that it exposes to guests *unless* it wants them to obey the ACPI specification. Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2021-12-07tracing: Make trace_marker{,_raw} stream-likeJohn Keeping1-10/+8
The tracing marker files are write-only streams with no meaningful concept of file position. Using stream_open() to mark them as stream-link indicates this and has the added advantage that a single file descriptor can now be used from multiple threads without contention thanks to clearing FMODE_ATOMIC_POS. Note that this has the potential to break existing userspace by since both lseek(2) and pwrite(2) will now return ESPIPE when previously lseek would have updated the stored offset and pwrite would have appended to the trace. A survey of libtracefs and several other projects found to use trace_marker(_raw) [1][2][3] suggests that everyone limits themselves to calling write(2) and close(2) on these file descriptors so there is a good chance this will go unnoticed and the benefits of reduced overhead and lock contention seem worth the risk. [1] https://github.com/google/perfetto [2] https://github.com/intel/media-driver/ [3] https://w1.fi/cgit/hostap/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: John Keeping <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-12-07rcutorture: Combine n_max_cbs from all kthreads in a callback floodPaul E. McKenney1-0/+9
With the addition of multiple callback-flood kthreads, the maximum number of callbacks from any one of those kthreads is reported in the rcutorture run summary. This commit changes this to report the sum of each kthread's maximum number of callbacks in a given callback-flooding episode. Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07rcutorture: Add ability to limit callback-flood intensityPaul E. McKenney1-5/+14
The RCU tasks flavors of RCU now need concurrent callback flooding to test their ability to switch between single-queue mode and per-CPU queue mode, but their lack of heavy-duty forward-progress features rules out the use of rcutorture's current callback-flooding code. This commit therefore provides the ability to limit the intensity of the callback floods using a new ->cbflood_max field in the rcu_operations structure. When this field is zero, there is no limit, otherwise, each callback-flood kthread allocates at most ->cbflood_max callbacks. Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07rcutorture: Enable multiple concurrent callback-flood kthreadsPaul E. McKenney1-28/+86
This commit converts the rcutorture.fwd_progress module parameter from bool to int, so that it specifies the number of callback-flood kthreads. Values less than zero specify one kthread per CPU, however, the number of kthreads executing concurrently is limited to the number of online CPUs. This commit also reverse the order of the need-resched and callback-flood operations to cause the callback flooding to happen more nearly at the same time. Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07rcutorture: Avoid soft lockup during cpu stallWander Lairson Costa1-0/+5
If we use the module stall_cpu option, we may get a soft lockup warning in case we also don't pass the stall_cpu_block option. Introduce the stall_no_softlockup option to avoid a soft lockup on cpu stall even if we don't use the stall_cpu_block option. Signed-off-by: Wander Lairson Costa <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07locktorture,rcutorture,torture: Always log error messageLi Zhijian3-8/+8
Unconditionally log messages corresponding to errors. Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Li Zhijian <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07scftorture: Always log error messageLi Zhijian1-5/+4
Unconditionally log messages corresponding to errors. Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Li Zhijian <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07rcuscale: Always log error messageLi Zhijian1-7/+7
Unconditionally log messages corresponding to errors. Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Li Zhijian <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07scftorture: Remove unused SCFTORTOUTLi Zhijian1-3/+0
There are no longer any users of SCFTORTOUT(), so this commit removes it. Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Li Zhijian <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07scftorture: Add missing '\n' to flush messageLi Zhijian1-3/+3
Add '\n' to macros to flush message for each call. Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Li Zhijian <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07refscale: Add missing '\n' to flush messageLi Zhijian1-4/+7
Add '\n' to macros to flush message for each call. Acked-by: Davidlohr Bueso <[email protected]> Signed-off-by: Li Zhijian <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07refscale: Always log the error messageLi Zhijian1-4/+3
An OOM is a serious error that should be logged even in non-verbose runs. This commit therefore adds an unconditional SCALEOUT_ERRSTRING() macro and uses it instead of VERBOSE_SCALEOUT_ERRSTRING() when reporting an OOM. [ paulmck: Drop do-while from SCALEOUT_ERRSTRING() due to only single statement. ] Signed-off-by: Li Zhijian <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07rcu_tasks: Convert bespoke callback list to rcu_segcblist structurePaul E. McKenney2-24/+30
This commit moves from a bespoke head and tail pointer in the rcu_tasks_percpu structure to an rcu_segcblist structure, thus allowing associating the grace-period sequence number with groups of callbacks. This in turn will allow callbacks to be invoked independently on different CPUs. Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07rcu-tasks: Convert grace-period counter to grace-period sequence numberPaul E. McKenney1-4/+5
This commit moves the rcu_tasks structure's ->n_gps grace-period-counter field to a ->task_gp_seq grce-period sequence number in order to enable use of the rcu_segcblist structure for the callback lists. This in turn permits CPUs to lag behind the RCU Tasks grace-period sequence number without suffering long-term slowdowns in callback invocation. Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selectionPaul E. McKenney1-1/+6
This commit introduces a ->percpu_enqueue_shift field to the rcu_tasks structure, and uses it to shift down the CPU number in order to select a rcu_tasks_percpu structure. This field is currently set to a sufficiently large shift count to always select the CPU-0 instance of the rcu_tasks_percpu structure, and later commits will adjust this. Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07rcu-tasks: Create per-CPU callback listsPaul E. McKenney1-28/+78
Currently, RCU Tasks Trace (as well as the other two flavors of RCU Tasks) use a single global callback list. This works well and is simple, but expected changes in workload will cause this list to become a bottleneck. This commit therefore creates per-CPU callback lists for the various flavors of RCU Tasks, but continues queueing on a single list, namely that of CPU 0. Later commits will dynamically vary the number of lists in use to accommodate dynamic changes in workload. Reported-by: Martin Lau <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Tested-by: kernel test robot <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07rcu/nocb: Don't invoke local rcu core on callback overload from nocb kthreadFrederic Weisbecker1-5/+5
rcu_core() tries to ensure that its self-invocation in case of callbacks overload only happen in softirq/rcuc mode. Indeed it doesn't make sense to trigger local RCU core from nocb_cb kthread since it can execute on a CPU different from the target rdp. Also in case of overload, the nocb_cb kthread simply iterates a new loop of callbacks processing. However the "offloaded" check that aims at preventing misplaced rcu_core() invocations is wrong. First of all that state is volatile and second: softirq/rcuc can execute while the target rdp is offloaded. As a result rcu_core() can be invoked on the wrong CPU while in the process of (de-)offloading. Fix that with moving the rcu_core() self-invocation to rcu_core() itself, irrespective of the rdp offloaded state. Tested-by: Valentin Schneider <[email protected]> Tested-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Uladzislau Rezki <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07rcu: Apply callbacks processing time limit only on softirqFrederic Weisbecker1-13/+12
Time limit only makes sense when callbacks are serviced in softirq mode because: _ In case we need to get back to the scheduler, cond_resched_tasks_rcu_qs() is called after each callback. _ In case some other softirq vector needs the CPU, the call to local_bh_enable() before cond_resched_tasks_rcu_qs() takes care about them via a call to do_softirq(). Therefore, make sure the time limit only applies to softirq mode. Reviewed-by: Valentin Schneider <[email protected]> Tested-by: Valentin Schneider <[email protected]> Tested-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Uladzislau Rezki <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07rcu: Fix callbacks processing time limit retaining cond_resched()Frederic Weisbecker1-10/+15
The callbacks processing time limit makes sure we are not exceeding a given amount of time executing the queue. However its "continue" clause bypasses the cond_resched() call on rcuc and NOCB kthreads, delaying it until we reach the limit, which can be very long... Make sure the scheduler has a higher priority than the time limit. Reviewed-by: Valentin Schneider <[email protected]> Tested-by: Valentin Schneider <[email protected]> Tested-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Uladzislau Rezki <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07rcu/nocb: Limit number of softirq callbacks only on softirqFrederic Weisbecker1-3/+2
The current condition to limit the number of callbacks executed in a row checks the offloaded state of the rdp. Not only is it volatile but it is also misleading: the rcu_core() may well be executing callbacks concurrently with NOCB kthreads, and the offloaded state would then be verified on both cases. As a result the limit would spuriously not apply anymore on softirq while in the middle of (de-)offloading process. Fix and clarify the condition with those constraints in mind: _ If callbacks are processed either by rcuc or NOCB kthread, the call to cond_resched_tasks_rcu_qs() is enough to take care of the overload. _ If instead callbacks are processed by softirqs: * If need_resched(), exit the callbacks processing * Otherwise if CPU is idle we can continue * Otherwise exit because a softirq shouldn't interrupt a task for too long nor deprive other pending softirq vectors of the CPU. Tested-by: Valentin Schneider <[email protected]> Tested-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Uladzislau Rezki <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07rcu/nocb: Use appropriate rcu_nocb_lock_irqsave()Frederic Weisbecker1-5/+3
Instead of hardcoding IRQ save and nocb lock, use the consolidated API (and fix a comment as per Valentin Schneider's suggestion). Reviewed-by: Valentin Schneider <[email protected]> Tested-by: Valentin Schneider <[email protected]> Tested-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Uladzislau Rezki <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07rcu/nocb: Check a stable offloaded state to manipulate qlen_last_fqs_checkFrederic Weisbecker1-1/+1
It's not entirely obvious why rdp->qlen_last_fqs_check is updated before processing the queue only on offloaded rdp. There can be different effect to that, either in favour of triggering the force quiescent state path or not. For example: 1) If the number of callbacks has decreased since the last rdp->qlen_last_fqs_check update (because we recently called rcu_do_batch() and we executed below qhimark callbacks) and the number of processed callbacks on a subsequent do_batch() arranges for exceeding qhimark on non-offloaded but not on offloaded setup, then we may spare a later run to the force quiescent state slow path on __call_rcu_nocb_wake(), as compared to the non-offloaded counterpart scenario. Here is such an offloaded scenario instance: qhimark = 1000 rdp->last_qlen_last_fqs_check = 3000 rcu_segcblist_n_cbs(rdp) = 2000 rcu_do_batch() { if (offloaded) rdp->last_qlen_fqs_check = rcu_segcblist_n_cbs(rdp) // 2000 // run 1000 callback rcu_segcblist_n_cbs(rdp) = 1000 // Not updating rdp->qlen_last_fqs_check if (count < rdp->qlen_last_fqs_check - qhimark) rdp->qlen_last_fqs_check = count; } call_rcu() * 1001 { __call_rcu_nocb_wake() { // not taking the fqs slowpath: // rcu_segcblist_n_cbs(rdp) == 2001 // rdp->qlen_last_fqs_check == 2000 // qhimark == 1000 if (len > rdp->qlen_last_fqs_check + qhimark) ... } In the case of a non-offloaded scenario, rdp->qlen_last_fqs_check would be 1000 and the fqs slowpath would have executed. 2) If the number of callbacks has increased since the last rdp->qlen_last_fqs_check update (because we recently queued below qhimark callbacks) and the number of callbacks executed in rcu_do_batch() doesn't exceed qhimark for either offloaded or non-offloaded setup, then it's possible that the offloaded scenario later run the force quiescent state slow path on __call_rcu_nocb_wake() while the non-offloaded doesn't. qhimark = 1000 rdp->last_qlen_last_fqs_check = 3000 rcu_segcblist_n_cbs(rdp) = 2000 rcu_do_batch() { if (offloaded) rdp->last_qlen_last_fqs_check = rcu_segcblist_n_cbs(rdp) // 2000 // run 100 callbacks // concurrent queued 100 rcu_segcblist_n_cbs(rdp) = 2000 // Not updating rdp->qlen_last_fqs_check if (count < rdp->qlen_last_fqs_check - qhimark) rdp->qlen_last_fqs_check = count; } call_rcu() * 1001 { __call_rcu_nocb_wake() { // Taking the fqs slowpath: // rcu_segcblist_n_cbs(rdp) == 3001 // rdp->qlen_last_fqs_check == 2000 // qhimark == 1000 if (len > rdp->qlen_last_fqs_check + qhimark) ... } In the case of a non-offloaded scenario, rdp->qlen_last_fqs_check would be 3000 and the fqs slowpath would have executed. The reason for updating rdp->qlen_last_fqs_check when invoking callbacks for offloaded CPUs is that there is usually no point in waking up either the rcuog or rcuoc kthreads while in this state. After all, both threads are prohibited from indefinite sleeps. The exception is when some huge number of callbacks are enqueued while rcu_do_batch() is in the midst of invoking, in which case interrupting the rcuog kthread's timed sleep might get more callbacks set up for the next grace period. Reported-and-tested-by: Valentin Schneider <[email protected]> Tested-by: Sebastian Andrzej Siewior <[email protected]> Original-patch-by: Thomas Gleixner <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Uladzislau Rezki <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-12-07rcu/nocb: Make rcu_core() callbacks acceleration (de-)offloading safeFrederic Weisbecker1-2/+16
When callbacks are offloaded, the NOCB kthreads handle the callbacks progression on behalf of rcu_core(). However during the (de-)offloading process, the kthread may not be entirely up to the task. As a result some callbacks grace period sequence number may remain stale for a while because rcu_core() won't take care of them either. Fix this with forcing callbacks acceleration from rcu_core() as long as the offloading process isn't complete. Reported-and-tested-by: Valentin Schneider <[email protected]> Tested-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Uladzislau Rezki <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>