aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-01-22Merge branches 'doc.2021.01.06a', 'fixes.2021.01.04b', ↵Paul E. McKenney57-345/+2647
'kfree_rcu.2021.01.04a', 'mmdumpobj.2021.01.22a', 'nocb.2021.01.06a', 'rt.2021.01.04a', 'stall.2021.01.06a', 'torture.2021.01.12a' and 'tortureall.2021.01.06a' into HEAD doc.2021.01.06a: Documentation updates. fixes.2021.01.04b: Miscellaneous fixes. kfree_rcu.2021.01.04a: kfree_rcu() updates. mmdumpobj.2021.01.22a: Dump allocation point for memory blocks. nocb.2021.01.06a: RCU callback offload updates and cblist segment lengths. rt.2021.01.04a: Real-time updates. stall.2021.01.06a: RCU CPU stall warning updates. torture.2021.01.12a: Torture-test updates and polling SRCU grace-period API. tortureall.2021.01.06a: Torture-test script updates.
2021-01-22percpu_ref: Dump mem_dump_obj() info upon reference-count underflowPaul E. McKenney1-3/+9
Reference-count underflow for percpu_ref is detected in the RCU callback percpu_ref_switch_to_atomic_rcu(), and the resulting warning does not print anything allowing easy identification of which percpu_ref use case is underflowing. This is of course not normally a problem when developing a new percpu_ref use case because it is most likely that the problem resides in this new use case. However, when deploying a new kernel to a large set of servers, the underflow might well be a new corner case in any of the old percpu_ref use cases. This commit therefore calls mem_dump_obj() to dump out any additional available information on the underflowing percpu_ref instance. Cc: Ming Lei <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Joonsoo Kim <[email protected]> Reported-by: Andrii Nakryiko <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-22rcu: Make call_rcu() print mem_dump_obj() info for double-freed callbackPaul E. McKenney1-2/+5
The debug-object double-free checks in __call_rcu() print out the RCU callback function, which is usually sufficient to track down the double free. However, all uses of things like queue_rcu_work() will have the same RCU callback function (rcu_work_rcufn() in this case), so a diagnostic message for a double queue_rcu_work() needs more than just the callback function. This commit therefore calls mem_dump_obj() to dump out any additional available information on the double-freed callback. Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Andrew Morton <[email protected]> Cc: <[email protected]> Reported-by: Andrii Nakryiko <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-22mm: Make mem_obj_dump() vmalloc() dumps include start and lengthPaul E. McKenney1-1/+2
This commit adds the starting address and number of pages to the vmalloc() information dumped by way of vmalloc_dump_obj(). Cc: Andrew Morton <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Reported-by: Andrii Nakryiko <[email protected]> Suggested-by: Vlastimil Babka <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-22mm: Make mem_dump_obj() handle vmalloc() memoryPaul E. McKenney3-6/+26
This commit adds vmalloc() support to mem_dump_obj(). Note that the vmalloc_dump_obj() function combines the checking and dumping, in contrast with the split between kmem_valid_obj() and kmem_dump_obj(). The reason for the difference is that the checking in the vmalloc() case involves acquiring a global lock, and redundant acquisitions of global locks should be avoided, even on not-so-fast paths. Note that this change causes on-stack variables to be reported as vmalloc() storage from kernel_clone() or similar, depending on the degree of inlining that your compiler does. This is likely more helpful than the earlier "non-paged (local) memory". Cc: Andrew Morton <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: <[email protected]> Reported-by: Andrii Nakryiko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-22mm: Make mem_dump_obj() handle NULL and zero-sized pointersPaul E. McKenney1-1/+6
This commit makes mem_dump_obj() call out NULL and zero-sized pointers specially instead of classifying them as non-paged memory. Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Andrew Morton <[email protected]> Cc: <[email protected]> Reported-by: Andrii Nakryiko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-22mm: Add mem_dump_obj() to print source of memory blockPaul E. McKenney8-0/+181
There are kernel facilities such as per-CPU reference counts that give error messages in generic handlers or callbacks, whose messages are unenlightening. In the case of per-CPU reference-count underflow, this is not a problem when creating a new use of this facility because in that case the bug is almost certainly in the code implementing that new use. However, trouble arises when deploying across many systems, which might exercise corner cases that were not seen during development and testing. Here, it would be really nice to get some kind of hint as to which of several uses the underflow was caused by. This commit therefore exposes a mem_dump_obj() function that takes a pointer to memory (which must still be allocated if it has been dynamically allocated) and prints available information on where that memory came from. This pointer can reference the middle of the block as well as the beginning of the block, as needed by things like RCU callback functions and timer handlers that might not know where the beginning of the memory block is. These functions and handlers can use mem_dump_obj() to print out better hints as to where the problem might lie. The information printed can depend on kernel configuration. For example, the allocation return address can be printed only for slab and slub, and even then only when the necessary debug has been enabled. For slab, build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space to the next power of two or use the SLAB_STORE_USER when creating the kmem_cache structure. For slub, build with CONFIG_SLUB_DEBUG=y and boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create() if more focused use is desired. Also for slub, use CONFIG_STACKTRACE to enable printing of the allocation-time stack trace. Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Andrew Morton <[email protected]> Cc: <[email protected]> Reported-by: Andrii Nakryiko <[email protected]> [ paulmck: Convert to printing and change names per Joonsoo Kim. ] [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ] [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ] [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ] [ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ] [ paulmck: Explicitly check for small pointers per Naresh Kamboju. ] Acked-by: Joonsoo Kim <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Tested-by: Naresh Kamboju <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-12rcutorture: Add rcutree.use_softirq=0 to RUDE01 and TASKS01Paul E. McKenney2-0/+2
RCU's rcutree.use_softirq=0 kernel boot parameter substitutes the per-CPU rcuc kthreads for softirq, which is used in real-time installations. However, none of the rcutorture scenarios test this parameter. This commit therefore adds rcutree.use_softirq=0 to the RUDE01 and TASKS01 rcutorture scenarios, both of which indirectly exercise RCU. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Maintain torture-specific set of CPUs-online booksPaul E. McKenney3-2/+23
The TREE01 rcutorture scenario intentionally creates confusion as to the number of available CPUs by specifying the "maxcpus=8 nr_cpus=43" kernel boot parameters. This can disable rcutorture's load shedding, which currently uses num_online_cpus(), which would count the extra 35 CPUs. However, the rcutorture guest OS will be provisioned with only 8 CPUs, which means that rcutorture will present full load even when all but one of the original 8 CPUs are offline. This can result in spurious errors due to extreme overloading of that single remaining CPU. This commit therefore keeps a separate set of books on the number of usable online CPUs, so that torture_num_online_cpus() is used for load shedding instead of num_online_cpus(). Note that initial sizing must use num_online_cpus() because torture_num_online_cpus() will return NR_CPUS until shortly after torture_onoff_init() is invoked. Reported-by: Frederic Weisbecker <[email protected]> [ paulmck: Apply feedback from kernel test robot. ] Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Clean up after torture-test CPU hotpluggingPaul E. McKenney1-14/+22
This commit puts all CPUs back online at the end of a torture test, and also unconditionally puts them online at the beginning of the test, rather than just in the case of built-in tests. This allows torture tests to behave in a predictable manner, whether built-in or based on modules. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06rcutorture: Make object_debug also double call_rcu() heap objectPaul E. McKenney1-0/+5
This commit provides a test for call_rcu() printing the allocation address of a double-freed callback by double-freeing a callback allocated via kmalloc(). However, this commit does not depend on any other commit. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Throttle VERBOSE_TOROUT_*() outputPaul E. McKenney3-2/+41
This commit adds kernel boot parameters torture.verbose_sleep_frequency and torture.verbose_sleep_duration, which allow VERBOSE_TOROUT_*() output to be throttled with periodic sleeps on large systems. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Make refscale throttle high-rate printk()sPaul E. McKenney1-1/+3
This commit adds a short delay for verbose_batched-throttled printk()s to further decrease console flooding. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06rcutorture: Use hrtimers for reader and writer delaysPaul E. McKenney1-4/+3
This commit replaces schedule_timeout_uninterruptible() and schedule_timeout_interruptible() with torture_hrtimeout_us() and torture_hrtimeout_jiffies() to avoid timer-wheel synchronization. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Make stutter use torture_hrtimeout_*() functionsPaul E. McKenney1-15/+5
This commit saves a few lines of code by making the stutter_wait() and torture_stutter() functions use torture_hrtimeout_jiffies() and torture_hrtimeout_us(). Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06rcutorture: Use torture_hrtimeout_jiffies() to avoid busy-waitsPaul E. McKenney1-19/+7
Because rcu_torture_writer() and rcu_torture_fakewriter() predate hrtimers, they do timer-wheel-decoupled timed waits by using the timer-wheel-based schedule_timeout_interruptible() functions in conjunction with a random udelay()-based wait. This latter unnecessarily burns CPU time, so this commit instead uses torture_hrtimeout_jiffies() to decouple from the timer wheels without busy-waiting. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Add fuzzed hrtimer-based sleep functionsPaul E. McKenney2-0/+82
This commit adds torture_hrtimeout_ns(), torture_hrtimeout_us(), torture_hrtimeout_ms(), torture_hrtimeout_jiffies(), and torture_hrtimeout_s(), each of which uses hrtimers to block for a fuzzed time interval. These functions are intended to be used by the various torture tests to decouple wakeups from the timer wheel, thus providing more opportunity for Murphy to insert destructive race conditions. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06rcutorture: Make rcu_torture_fakewriter() use blocking wait primitivesPaul E. McKenney1-8/+32
Full testing of the new SRCU polling API requires that the fake writers also use it in order to test concurrent calls to all of the API members, especially start_poll_synchronize_srcu(). This commit makes rcu_torture_fakewriter() use all available blocking grace-period-wait primitives available from the RCU flavor under test. Link: https://lore.kernel.org/rcu/[email protected]/ Reported-by: Kent Overstreet <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06rcutorture: Make synctype[] and nsynctype be static globalPaul E. McKenney1-26/+36
Full testing of the new SRCU polling API requires that the fake writers also use it in order to test concurrent calls to all of the API members, especially start_poll_synchronize_srcu(). This commit prepares the ground for this by making the synctype[] and nsynctype variables be static globals so that the rcu_torture_fakewriter() function can access them. Initialization of these variables is moved from rcu_torture_writer() to a new rcu_torture_write_types() function that is invoked from rcu_torture_init() just before the first writer kthread is spawned. Link: https://lore.kernel.org/rcu/[email protected]/ Reported-by: Kent Overstreet <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06rcutorture: Require entire stutter period be post-bootPaul E. McKenney1-1/+3
Currently, the rcu_torture_writer() function checks that all required grace periods elapse during a stutter interval, which is a multi-second time period during which the test load is removed. However, this check is suppressed during early boot (that is, before init is spawned) in order to avoid false positives that otherwise occur due to heavy load on the single boot CPU. Unfortunately, this approach is insufficient. It is possible that the stutter interval might end just as init is spawned, so that early boot conditions prevailed during almost the entire stutter interval. This commit therefore takes a snapshot of boot-complete state just before the stutter interval, thus suppressing the check for failure to complete grace periods unless the entire stutter interval took place after early boot. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06refscale: Allow summarization of verbose outputPaul E. McKenney2-5/+22
The refscale test prints enough per-kthread console output to provoke RCU CPU stall warnings on large systems. This commit therefore allows this output to be summarized. For example, the refscale.verbose_batched=32 boot parameter would causes only every 32nd line of output to be logged. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Compress KASAN vmlinux filesPaul E. McKenney1-2/+46
The sizes of vmlinux files built with KASAN enabled can approach a full gigabyte, which can result in disk overflow sooner rather than later. Fortunately, the xz command compresses them by almost an order of magnitude. This commit therefore uses xz to compress vmlinux file built by torture.sh with KASAN enabled. However, xz is not the fastest thing in the world. In fact, it is way slower than rotating-rust mass storage. This commit therefore also adds a --compress-kasan-vmlinux argument to specify the degree of xz concurrency, which defaults to using all available CPUs if there are that many files in need of compression. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Add --kcsan-kmake-arg to torture.sh for KCSANPaul E. McKenney1-6/+15
In 2020, running KCSAN often requires careful choice of compiler. This commit therefore adds a --kcsan-kmake-arg parameter to torture.sh to allow specifying (for example) "CC=clang" to the kernel build process to correctly build a KCSAN-enabled kernel. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Add command and results directory to torture.sh logPaul E. McKenney1-4/+7
This commit adds the command and arguments to the torture.sh log file, and also outputs the results directory. This latter allows impatient users to quickly find the results that are being generated by the current run. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Allow scenarios to be specified to torture.shPaul E. McKenney1-3/+43
This commit adds --configs-rcutorture, --configs-locktorture, and --configs-scftorture arguments to torture.sh, allowing the desired set of scenarios to be passed to each. The default for each has been changed from a large-system-appropriate set to just CFLIST for each. Users are encouraged to create scripts that provide appropriate settings for their specific systems. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Drop log.long generation from torture.shPaul E. McKenney1-4/+1
Now that kvm.sh puts all the relevant details in the "log" file, there is no need for torture.sh to generate a separate "log.long" file. This commit therefore drops this from torture.sh. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Make torture.sh refuse to do zero-length runsPaul E. McKenney1-6/+19
This commit causes torture.sh to check for zero-length runs and to take the cowardly option of refusing to run them, logging its cowardice for later inspection. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Make torture.sh throttle VERBOSE_TOROUT_*() for refscalePaul E. McKenney1-1/+1
This commit causes torture.sh to use the torture.verbose_sleep_frequency kernel boot parameter to throttle verbose refscale output on large systems. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Make torture.sh allmodconfig retain and label outputPaul E. McKenney1-3/+6
This commit places "---" markers in the torture.sh script's allmodconfig output, and uses "<<" to avoid overwriting earlier output from this build test. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Create doyesno helper function for torture.shPaul E. McKenney1-59/+19
This commit saves a few lines of code by creating a doyesno helper bash function for argument parsing. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Make torture.sh refscale runs use verbose_batched module parameterPaul E. McKenney1-1/+6
On large systems, the refscale printk() rate can overrun the file system's ability to accept console log messages. This commit therefore uses the new verbose_batched module parameter to rate-limit some of the higher-rate printk() calls. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Make torture.sh rcuscale and refscale deal with allmodconfigPaul E. McKenney1-2/+2
The .mod.c files created by allmodconfig builds interfers with the approach torture.sh uses to enumerate types of rcuscale and refscale runs. This commit therefore tightens the pattern matching to avoid this interference. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Enable torture.sh argument checkingPaul E. McKenney1-5/+5
This commit uncomments the argument checking for the --duration argument to torture.sh. While in the area, it also corrects the duration units from seconds to minutes. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Auto-size SCF and scaling runs based on number of CPUsPaul E. McKenney1-7/+12
This commit improves torture.sh flexibility by autoscaling the number of CPUs to be used in variable-CPUs torture tests, including scftorture, refscale, rcuscale, and kvfree. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Add "make allmodconfig" to torture.shPaul E. McKenney1-1/+36
This commit adds the ability to do "make allmodconfig" to torture.sh, given that normal rcutorture runs do not normally catch missing exports. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Remove use of "eval" in torture.shPaul E. McKenney1-11/+44
The bash "eval" command enables Bobby Tables attacks, which might not be a concern in torture testing by themselves, but one could imagine these combined with a cut-and-paste attack. This commit therefore gets rid of them. This comes at a price in terms of bash quoting not working nicely, so the "--bootargs" argument lists are now passed to torture_one via a bash-variable side channel. This might be a bit ugly, but it will also allow torture.sh to grow its own --bootargs parameter. While in the area, add proper header comments for the bash functions. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Make torture.sh use common time-duration bash functionsPaul E. McKenney1-13/+7
This commit makes torture.sh use the new bash functions get_starttime() and get_starttime_duration() created for kvm.sh. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06torture: Add torture.sh torture-everything scriptPaul E. McKenney1-0/+301
Although tailoring a specific set of kvm.sh runs has served rcutorture testing well over many years, it requires a relatively distraction-free environment, which is not always available. This commit therefore adds a prototype torture.sh script that by default tortures pretty much everything the rcutorture scripting is designed to torture, and which can be given command-line arguments to take a more focused approach. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06rcu: Check and report missed fqs timer wakeup on RCU stallNeeraj Upadhyay3-11/+69
For a new grace period request, the RCU GP kthread transitions through following states: a. [RCU_GP_WAIT_GPS] -> [RCU_GP_DONE_GPS] The RCU_GP_WAIT_GPS state is where the GP kthread waits for a request for a new GP. Once it receives a request (for example, when a new RCU callback is queued), the GP kthread transitions to RCU_GP_DONE_GPS. b. [RCU_GP_DONE_GPS] -> [RCU_GP_ONOFF] Grace period initialization starts in rcu_gp_init(), which records the start of new GP in rcu_state.gp_seq and transitions to RCU_GP_ONOFF. c. [RCU_GP_ONOFF] -> [RCU_GP_INIT] The purpose of the RCU_GP_ONOFF state is to apply the online/offline information that was buffered for any CPUs that recently came online or went offline. This state is maintained in per-leaf rcu_node bitmasks, with the buffered state in ->qsmaskinitnext and the state for the upcoming GP in ->qsmaskinit. At the end of this RCU_GP_ONOFF state, each bit in ->qsmaskinit will correspond to a CPU that must pass through a quiescent state before the upcoming grace period is allowed to complete. However, a leaf rcu_node structure with an all-zeroes ->qsmaskinit cannot necessarily be ignored. In preemptible RCU, there might well be tasks still in RCU read-side critical sections that were first preempted while running on one of the CPUs managed by this structure. Such tasks will be queued on this structure's ->blkd_tasks list. Only after this list fully drains can this leaf rcu_node structure be ignored, and even then only if none of its CPUs have come back online in the meantime. Once that happens, the ->qsmaskinit masks further up the tree will be updated to exclude this leaf rcu_node structure. Once the ->qsmaskinitnext and ->qsmaskinit fields have been updated as needed, the GP kthread transitions to RCU_GP_INIT. d. [RCU_GP_INIT] -> [RCU_GP_WAIT_FQS] The purpose of the RCU_GP_INIT state is to copy each ->qsmaskinit to the ->qsmask field within each rcu_node structure. This copying is done breadth-first from the root to the leaves. Why not just copy directly from ->qsmaskinitnext to ->qsmask? Because the ->qsmaskinitnext masks can change in the meantime as additional CPUs come online or go offline. Such changes would result in inconsistencies in the ->qsmask fields up and down the tree, which could in turn result in too-short grace periods or grace-period hangs. These issues are avoided by snapshotting the leaf rcu_node structures' ->qsmaskinitnext fields into their ->qsmaskinit counterparts, generating a consistent set of ->qsmaskinit fields throughout the tree, and only then copying these consistent ->qsmaskinit fields to their ->qsmask counterparts. Once this initialization step is complete, the GP kthread transitions to RCU_GP_WAIT_FQS, where it waits to do a force-quiescent-state scan on the one hand or for the end of the grace period on the other. e. [RCU_GP_WAIT_FQS] -> [RCU_GP_DOING_FQS] The RCU_GP_WAIT_FQS state waits for one of three things: (1) An explicit request to do a force-quiescent-state scan, (2) The end of the grace period, or (3) A short interval of time, after which it will do a force-quiescent-state (FQS) scan. The explicit request can come from rcutorture or from any CPU that has too many RCU callbacks queued (see the qhimark kernel parameter and the RCU_GP_FLAG_OVLD flag). The aforementioned "short period of time" is specified by the jiffies_till_first_fqs boot parameter for a given grace period's first FQS scan and by the jiffies_till_next_fqs for later FQS scans. Either way, once the wait is over, the GP kthread transitions to RCU_GP_DOING_FQS. f. [RCU_GP_DOING_FQS] -> [RCU_GP_CLEANUP] The RCU_GP_DOING_FQS state performs an FQS scan. Each such scan carries out two functions for any CPU whose bit is still set in its leaf rcu_node structure's ->qsmask field, that is, for any CPU that has not yet reported a quiescent state for the current grace period: i. Report quiescent states on behalf of CPUs that have been observed to be idle (from an RCU perspective) since the beginning of the grace period. ii. If the current grace period is too old, take various actions to encourage holdout CPUs to pass through quiescent states, including enlisting the aid of any calls to cond_resched() and might_sleep(), and even including IPIing the holdout CPUs. These checks are skipped for any leaf rcu_node structure with a all-zero ->qsmask field, however such structures are subject to RCU priority boosting if there are tasks on a given structure blocking the current grace period. The end of the grace period is detected when the root rcu_node structure's ->qsmask is zero and when there are no longer any preempted tasks blocking the current grace period. (No, this last check is not redundant. To see this, consider an rcu_node tree having exactly one structure that serves as both root and leaf.) Once the end of the grace period is detected, the GP kthread transitions to RCU_GP_CLEANUP. g. [RCU_GP_CLEANUP] -> [RCU_GP_CLEANED] The RCU_GP_CLEANUP state marks the end of grace period by updating the rcu_state structure's ->gp_seq field and also all rcu_node structures' ->gp_seq field. As before, the rcu_node tree is traversed in breadth first order. Once this update is complete, the GP kthread transitions to the RCU_GP_CLEANED state. i. [RCU_GP_CLEANED] -> [RCU_GP_INIT] Once in the RCU_GP_CLEANED state, the GP kthread immediately transitions into the RCU_GP_INIT state. j. The role of timers. If there is at least one idle CPU, and if timers are not firing, the transition from RCU_GP_DOING_FQS to RCU_GP_CLEANUP will never happen. Timers can fail to fire for a number of reasons, including issues in timer configuration, issues in the timer framework, and failure to handle softirqs (for example, when there is a storm of interrupts). Whatever the reason, if the timers fail to fire, the GP kthread will never be awakened, resulting in RCU CPU stall warnings and eventually in OOM. However, an RCU CPU stall warning has a large number of potential causes, as documented in Documentation/RCU/stallwarn.rst. This commit therefore adds analysis to the RCU CPU stall-warning code to emit an additional message if the cause of the stall is likely to be timer failure. Signed-off-by: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06rcu: Do any deferred nocb wakeups at CPU offline timePaul E. McKenney1-0/+3
Because the need to wake a nocb GP kthread ("rcuog") is sometimes detected when wakeups cannot be done, these wakeups can be deferred. The wakeups are then carried out by calls to do_nocb_deferred_wakeup() at various safe points in the code, including RCU's idle hooks. However, when a CPU goes offline, it invokes arch_cpu_idle_dead() without invoking any of RCU's idle hooks. This commit therefore adds a call to do_nocb_deferred_wakeup() in rcu_report_dead() in order to handle any deferred wakeups that have been requested by the outgoing CPU. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06rcu/nocb: Code-style nits in callback-offloading togglingPaul E. McKenney4-37/+30
This commit addresses a few code-style nits in callback-offloading toggling, including one that predates this toggling. Cc: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06rcu/nocb: Add nocb CB kthread list to show_rcu_nocb_state() outputPaul E. McKenney1-0/+1
This commit improves debuggability by indicating laying out the order in which rcuoc kthreads appear in the ->nocb_next_cb_rdp list. Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06rcu/nocb: Add grace period and task state to show_rcu_nocb_state() outputPaul E. McKenney2-7/+43
This commit improves debuggability by indicating which grace period each batch of nocb callbacks is waiting on and by showing the task state and last CPU for reach nocb kthread. [ paulmck: Handle !SMP CB offloading per kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06tools/rcutorture: Support nocb toggle in TREE01Frederic Weisbecker1-1/+3
This commit adds periodic toggling of 7 of 8 CPUs every second to TREE01 in order to test NOCB toggle code. Cc: Josh Triplett <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Thomas Gleixner <[email protected]> Inspired-by: Paul E. McKenney <[email protected]> Tested-by: Boqun Feng <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06rcutorture: Test runtime toggling of CPUs' callback offloadingPaul E. McKenney2-3/+95
Frederic Weisbecker is adding the ability to change the rcu_nocbs state of CPUs at runtime, that is, to offload and deoffload their RCU callback processing without the need to reboot. As the old saying goes, "if it ain't tested, it don't work", so this commit therefore adds prototype rcutorture testing for this capability. Signed-off-by: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]>
2021-01-06timer: Add timer_curr_running()Frederic Weisbecker2-0/+15
This commit adds a timer_curr_running() function that verifies that the current code is running in the context of the specified timer's handler. Cc: Josh Triplett <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Thomas Gleixner <[email protected]> Tested-by: Boqun Feng <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06cpu/hotplug: Add lockdep_is_cpus_held()Frederic Weisbecker2-0/+9
This commit adds a lockdep_is_cpus_held() function to verify that the proper locks are held and that various operations are running in the correct context. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Boqun Feng <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06rcu/nocb: Locally accelerate callbacks as long as offloading isn't completeFrederic Weisbecker1-4/+3
The local callbacks processing checks if any callbacks need acceleration. This commit carries out this checking under nocb lock protection in the middle of toggle operations, during which time rcu_core() executes concurrently with GP/CB kthreads. Cc: Josh Triplett <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Thomas Gleixner <[email protected]> Inspired-by: Paul E. McKenney <[email protected]> Tested-by: Boqun Feng <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06rcu/nocb: Process batch locally as long as offloading isn't completeFrederic Weisbecker2-1/+14
This commit makes sure to process the callbacks locally (via either RCU_SOFTIRQ or the rcuc kthread) whenever the segcblist isn't entirely offloaded. This ensures that callbacks are invoked one way or another while a CPU is in the middle of a toggle operation. Cc: Josh Triplett <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Thomas Gleixner <[email protected]> Inspired-by: Paul E. McKenney <[email protected]> Tested-by: Boqun Feng <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2021-01-06rcu/nocb: Only cond_resched() from actual offloaded batch processingFrederic Weisbecker1-2/+1
During a toggle operations, rcu_do_batch() may be invoked concurrently by softirqs and offloaded processing for a given CPU's callbacks. This commit therefore makes sure cond_resched() is invoked only from the offloaded context. Cc: Josh Triplett <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Thomas Gleixner <[email protected]> Inspired-by: Paul E. McKenney <[email protected]> Tested-by: Boqun Feng <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>