| Age | Commit message (Collapse) | Author | Files | Lines |
|
Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.
LOCKUP_DETECTOR implies the general boot, sysctl, and programming
interfaces for the lockup detectors.
An architecture that wants to use a hard lockup detector must define
HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH.
Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides the
minimum arch_touch_nmi_watchdog, and it otherwise does its own thing and
does not implement the LOCKUP_DETECTOR interfaces.
sparc is unusual in that it has started to implement some of the
interfaces, but not fully yet. It should probably be converted to a full
HAVE_HARDLOCKUP_DETECTOR_ARCH.
[[email protected]: fix]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Babu Moger <[email protected]> [sparc]
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
For architectures that define HAVE_NMI_WATCHDOG, instead of having them
provide the complete touch_nmi_watchdog() function, just have them
provide arch_touch_nmi_watchdog().
This gives the generic code more flexibility in implementing this
function, and arch implementations don't miss out on touching the
softlockup watchdog or other generic details.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Babu Moger <[email protected]> [sparc]
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "Improve watchdog config for arch watchdogs", v4.
A series to make the hardlockup watchdog more easily replaceable by arch
code. The last patch provides some justification for why we want to do
this (existing sparc watchdog is another that could benefit).
This patch (of 5):
Remove unused declaration.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Nicholas Piggin <[email protected]>
Reviewed-by: Don Zickus <[email protected]>
Reviewed-by: Babu Moger <[email protected]>
Tested-by: Babu Moger <[email protected]> [sparc]
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
These methods don't belong into <linux/sched.h>, they are neither directly
related to task_struct or are scheduler functionality.
Put them next to the other watchdog methods in <linux/nmi.h>.
( Arguably that header's name is a misnomer, and this patch makes it
more so - but it should be renamed in another patch. )
Acked-by: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
On an overloaded system, it is possible that a change in the watchdog
threshold can be delayed long enough to trigger a false positive.
This can easily be achieved by having a cpu spinning indefinitely on a
task, while another cpu updates watchdog threshold.
What happens is while trying to park the watchdog threads, the hrtimers
on the other cpus trigger and reprogram themselves with the new slower
watchdog threshold. Meanwhile, the nmi watchdog is still programmed
with the old faster threshold.
Because the one cpu is blocked, it prevents the thread parking on the
other cpus from completing, which is needed to shutdown the nmi watchdog
and reprogram it correctly. As a result, a false positive from the nmi
watchdog is reported.
Fix this by setting a park_in_progress flag to block all lockups until
the parking is complete.
Fix provided by Ulrich Obergfell.
[[email protected]: s/park_in_progress/watchdog_park_in_progress/]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Don Zickus <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "Clean up watchdog handlers", v2.
This is an attempt to cleanup watchdog handlers. Right now,
kernel/watchdog.c implements both softlockup and hardlockup detectors.
Softlockup code is generic. Hardlockup code is arch specific. Some
architectures don't use hardlockup detectors. They use their own
watchdog detectors. To make both these combination work, we have
numerous #ifdefs in kernel/watchdog.c.
We are trying here to make these handlers independent of each other.
Also provide an interface for architectures to implement their own
handlers. watchdog_nmi_enable and watchdog_nmi_disable will be defined
as weak such that architectures can override its definitions.
Thanks to Don Zickus for his suggestions.
Here are our previous discussions
http://www.spinics.net/lists/sparclinux/msg16543.html
http://www.spinics.net/lists/sparclinux/msg16441.html
This patch (of 3):
Move shared macros and definitions to nmi.h so that watchdog.c, new file
watchdog_hld.c or any other architecture specific handler can use those
definitions.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Babu Moger <[email protected]>
Acked-by: Don Zickus <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Yaowei Bai <[email protected]>
Cc: Aaron Tomlin <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Hidehiro Kawai <[email protected]>
Cc: Josh Hunt <[email protected]>
Cc: "David S. Miller" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "improvements to the nmi_backtrace code" v9.
This patch series modifies the trigger_xxx_backtrace() NMI-based remote
backtracing code to make it more flexible, and makes a few small
improvements along the way.
The motivation comes from the task isolation code, where there are
scenarios where we want to be able to diagnose a case where some cpu is
about to interrupt a task-isolated cpu. It can be helpful to see both
where the interrupting cpu is, and also an approximation of where the
cpu that is being interrupted is. The nmi_backtrace framework allows us
to discover the stack of the interrupted cpu.
I've tested that the change works as desired on tile, and build-tested
x86, arm, mips, and sparc64. For x86 I confirmed that the generic
cpuidle stuff as well as the architecture-specific routines are in the
new cpuidle section. For arm, mips, and sparc I just build-tested it
and made sure the generic cpuidle routines were in the new cpuidle
section, but I didn't attempt to figure out which the platform-specific
idle routines might be. That might be more usefully done by someone
with platform experience in follow-up patches.
This patch (of 4):
Currently you can only request a backtrace of either all cpus, or all
cpus but yourself. It can also be helpful to request a remote backtrace
of a single cpu, and since we want that, the logical extension is to
support a cpumask as the underlying primitive.
This change modifies the existing lib/nmi_backtrace.c code to take a
cpumask as its basic primitive, and modifies the linux/nmi.h code to use
the new "cpumask" method instead.
The existing clients of nmi_backtrace (arm and x86) are converted to
using the new cpumask approach in this change.
The other users of the backtracing API (sparc64 and mips) are converted
to use the cpumask approach rather than the all/allbutself approach.
The mips code ignored the "include_self" boolean but with this change it
will now also dump a local backtrace if requested.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Chris Metcalf <[email protected]>
Tested-by: Daniel Thompson <[email protected]> [arm]
Reviewed-by: Aaron Tomlin <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: David Miller <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 10f9014912 ("x86: Cleanup hw_nmi.c cruft") removed unused
code in the hw_nmi.c file because of the redesign of the hardlockup
watchdog but left declaration of hw_nmi_is_cpu_stuck in linux/nmi.h,
so remvoe it.
Signed-off-by: Yaowei Bai <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
In many cases of hardlockup reports, it's actually not possible to know
why it triggered, because the CPU that got stuck is usually waiting on a
resource (with IRQs disabled) in posession of some other CPU is holding.
IOW, we are often looking at the stacktrace of the victim and not the
actual offender.
Introduce sysctl / cmdline parameter that makes it possible to have
hardlockup detector perform all-CPU backtrace.
Signed-off-by: Jiri Kosina <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Acked-by: Don Zickus <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Pull NMI backtrace update from Russell King:
"These changes convert the x86 NMI handling to be a library
implementation which other architectures can make use of. Thomas
Gleixner has reviewed and tested these changes, and wishes me to send
these rather than taking them through the tip tree.
The final patch in the set adds an initial implementation using this
infrastructure to ARM, even though it doesn't send the IPI at "NMI"
level. Patches are in progress to add the ARM equivalent of NMI, but
we still need the IRQ-level fallback for systems where the "NMI" isn't
available due to secure firmware denying access to it"
* 'nmi' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: add basic support for on-demand backtrace of other CPUs
nmi: x86: convert to generic nmi handler
nmi: create generic NMI backtrace implementation
|
|
Rename watchdog_suspend() to lockup_detector_suspend() and
watchdog_resume() to lockup_detector_resume() to avoid confusion with the
watchdog subsystem and to be consistent with the existing name
lockup_detector_init().
Also provide comment blocks to explain the watchdog_running and
watchdog_suspended variables and their relationship.
Signed-off-by: Ulrich Obergfell <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Remove watchdog_nmi_disable_all() and watchdog_nmi_enable_all() since
these functions are no longer needed. If a subsystem has a need to
deactivate the watchdog temporarily, it should utilize the
watchdog_suspend() and watchdog_resume() functions.
[[email protected]: fix build with CONFIG_LOCKUP_DETECTOR=m]
Signed-off-by: Ulrich Obergfell <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This interface can be utilized to deactivate the hard and soft lockup
detector temporarily. Callers are expected to minimize the duration of
deactivation. Multiple deactivations are allowed to occur in parallel but
should be rare in practice.
[[email protected]: remove unneeded static initialization]
Signed-off-by: Ulrich Obergfell <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The kernel's NMI watchdog has nothing to do with the watchdog subsystem.
Its header declarations should be in linux/nmi.h, not linux/watchdog.h.
The code provided two sets of dummy functions if HARDLOCKUP_DETECTOR is
not configured, one in the include file and one in kernel/watchdog.c.
Remove the dummy functions from kernel/watchdog.c and use those from the
include file.
Signed-off-by: Guenter Roeck <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Don Zickus <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
x86s NMI backtrace implementation (for arch_trigger_all_cpu_backtrace())
is fairly generic in nature - the only architecture specific bits are
the act of raising the NMI to other CPUs, and reporting the status of
the NMI handler.
These are fairly simple to factor out, and produce a generic
implementation which can be shared between ARM and x86.
Reviewed-by: Thomas Gleixner <[email protected]>
Signed-off-by: Russell King <[email protected]>
|
|
Change the default behavior of watchdog so it only runs on the
housekeeping cores when nohz_full is enabled at build and boot time.
Allow modifying the set of cores the watchdog is currently running on
with a new kernel.watchdog_cpumask sysctl.
In the current system, the watchdog subsystem runs a periodic timer that
schedules the watchdog kthread to run. However, nohz_full cores are
designed to allow userspace application code running on those cores to
have 100% access to the CPU. So the watchdog system prevents the
nohz_full application code from being able to run the way it wants to,
thus the motivation to suppress the watchdog on nohz_full cores, which
this patchset provides by default.
However, if we disable the watchdog globally, then the housekeeping
cores can't benefit from the watchdog functionality. So we allow
disabling it only on some cores. See Documentation/lockup-watchdogs.txt
for more information.
[[email protected]: fix a watchdog crash in some configurations]
Signed-off-by: Chris Metcalf <[email protected]>
Acked-by: Don Zickus <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Have kvm_guest_init() use hardlockup_detector_disable() instead of
watchdog_enable_hardlockup_detector(false).
Remove the watchdog_hardlockup_detector_is_enabled() and the
watchdog_enable_hardlockup_detector() function which are no longer needed.
Signed-off-by: Ulrich Obergfell <[email protected]>
Signed-off-by: Don Zickus <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With the current user interface of the watchdog mechanism it is only
possible to disable or enable both lockup detectors at the same time.
This series introduces new kernel parameters and changes the semantics of
some existing kernel parameters, so that the hard lockup detector and the
soft lockup detector can be disabled or enabled individually. With this
series applied, the user interface is as follows.
- parameters in /proc/sys/kernel
. soft_watchdog
This is a new parameter to control and examine the run state of
the soft lockup detector.
. nmi_watchdog
The semantics of this parameter have changed. It can now be used
to control and examine the run state of the hard lockup detector.
. watchdog
This parameter is still available to control the run state of both
lockup detectors at the same time. If this parameter is examined,
it shows the logical OR of soft_watchdog and nmi_watchdog.
. watchdog_thresh
The semantics of this parameter are not affected by the patch.
- kernel command line parameters
. nosoftlockup
The semantics of this parameter have changed. It can now be used
to disable the soft lockup detector at boot time.
. nmi_watchdog=0 or nmi_watchdog=1
Disable or enable the hard lockup detector at boot time. The patch
introduces '=1' as a new option.
. nowatchdog
The semantics of this parameter are not affected by the patch. It
is still available to disable both lockup detectors at boot time.
Also, remove the proc_dowatchdog() function which is no longer needed.
[[email protected]: wrote changelog]
[[email protected]: update documentation for kernel params and sysctl]
Signed-off-by: Ulrich Obergfell <[email protected]>
Signed-off-by: Don Zickus <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Separate handlers for each watchdog parameter in /proc/sys/kernel replace
the proc_dowatchdog() function. Three of those handlers merely call
proc_watchdog_common() with one different argument.
Signed-off-by: Ulrich Obergfell <[email protected]>
Signed-off-by: Don Zickus <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The hardlockup and softockup had always been tied together. Due to the
request of KVM folks, they had a need to have one enabled but not the
other. Internally rework the code to split things apart more cleanly.
There is a bunch of churn here, but the end result should be code that
should be easier to maintain and fix without knowing the internals of what
is going on.
This patch (of 9):
Introduce new definitions and variables to separate the user interface in
/proc/sys/kernel from the internal run state of the lockup detectors. The
internal run state is represented by two bits in a new variable that is
named 'watchdog_enabled'. This helps simplify the code, for example:
- In order to check if any of the two lockup detectors is enabled,
it is sufficient to check if 'watchdog_enabled' is not zero.
- In order to enable/disable one or both lockup detectors,
it is sufficient to set/clear one or both bits in 'watchdog_enabled'.
- Concurrent updates of 'watchdog_enabled' need not be synchronized via
a spinlock or a mutex. Updates can either be atomic or concurrency can
be detected by using 'cmpxchg'.
Signed-off-by: Ulrich Obergfell <[email protected]>
Signed-off-by: Don Zickus <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In some cases we don't want hard lockup detection enabled by default.
An example is when running as a guest. Introduce
watchdog_enable_hardlockup_detector(bool)
allowing those cases to disable hard lockup detection. This must be
executed early by the boot processor from e.g. smp_prepare_boot_cpu, in
order to allow kernel command line arguments to override it, as well as
to avoid hard lockup detection being enabled before we've had a chance
to indicate that it's unwanted. In summary,
initial boot: default=enabled
smp_prepare_boot_cpu
watchdog_enable_hardlockup_detector(false): default=disabled
cmdline has 'nmi_watchdog=1': default=enabled
The running kernel still has the ability to enable/disable at any time
with /proc/sys/kernel/nmi_watchdog us usual. However even when the
default has been overridden /proc/sys/kernel/nmi_watchdog will initially
show '1'. To truly turn it on one must disable/enable it, i.e.
echo 0 > /proc/sys/kernel/nmi_watchdog
echo 1 > /proc/sys/kernel/nmi_watchdog
This patch will be immediately useful for KVM with the next patch of this
series. Other hypervisor guest types may find it useful as well.
[[email protected]: fix build]
[[email protected]: fix compile issues on sparc]
Signed-off-by: Ulrich Obergfell <[email protected]>
Signed-off-by: Andrew Jones <[email protected]>
Signed-off-by: Don Zickus <[email protected]>
Signed-off-by: Don Zickus <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently APEI depends on x86 architecture. It is because of NMI hardware
error notification of GHES which is currently supported by x86 only.
However, many other APEI features can be still used perfectly by other
architectures.
This commit adds two symbols:
1. HAVE_ACPI_APEI for those archs which support APEI.
2. HAVE_ACPI_APEI_NMI which is used for NMI code isolation in ghes.c
file. NMI related data and functions are grouped so they can be wrapped
inside one #ifdef section. Appropriate function stubs are provided for
!NMI case.
Note there is no functional changes for x86 due to hard selected
HAVE_ACPI_APEI and HAVE_ACPI_APEI_NMI symbols.
Signed-off-by: Tomasz Nowicki <[email protected]>
Acked-by: Borislav Petkov <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
|
|
A 'softlockup' is defined as a bug that causes the kernel to loop in
kernel mode for more than a predefined period to time, without giving
other tasks a chance to run.
Currently, upon detection of this condition by the per-cpu watchdog
task, debug information (including a stack trace) is sent to the system
log.
On some occasions, we have observed that the "victim" rather than the
actual "culprit" (i.e. the owner/holder of the contended resource) is
reported to the user. Often this information has proven to be
insufficient to assist debugging efforts.
To avoid loss of useful debug information, for architectures which
support NMI, this patch makes it possible to improve soft lockup
reporting. This is accomplished by issuing an NMI to each cpu to obtain
a stack trace.
If NMI is not supported we just revert back to the old method. A sysctl
and boot-time parameter is available to toggle this feature.
[[email protected]: add CONFIG_SMP in certain areas]
[[email protected]: additional CONFIG_SMP=n optimisations]
[[email protected]: fix warning]
Signed-off-by: Aaron Tomlin <[email protected]>
Signed-off-by: Don Zickus <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Jan Moskyto Matejka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Sometimes it is preferred not to use the trigger_all_cpu_backtrace()
routine when one wants to avoid capturing a back trace for current. For
instance if one was previously captured recently.
This patch provides a new routine namely
trigger_allbutself_cpu_backtrace() which offers the flexibility to issue
an NMI to every cpu but current and capture a back trace accordingly.
Patch x86 and sparc to support new routine.
[[email protected]: add stub in #else clause]
[[email protected]: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion]
[[email protected]: undo C99ism]
Signed-off-by: Aaron Tomlin <[email protected]>
Signed-off-by: Don Zickus <[email protected]>
Acked-by: David S. Miller <[email protected]>
Cc: Mateusz Guzik <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Signed-off-by: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We have two very conflicting state variable names in the
watchdog:
* watchdog_enabled: This one reflects the user interface. It's
set to 1 by default and can be overriden with boot options
or sysctl/procfs interface.
* watchdog_disabled: This is the internal toggle state that
tells if watchdog threads, timers and NMI events are currently
running or not. This state mostly depends on the user settings.
It's a convenient state latch.
Now we really need to find clearer names because those
are just too confusing to encourage deep review.
watchdog_enabled now becomes watchdog_user_enabled to reflect
its purpose as an interface.
watchdog_disabled becomes watchdog_running to suggest its
role as a pure internal state.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Srivatsa S. Bhat <[email protected]>
Cc: Anish Singh <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Don Zickus <[email protected]>
|
|
ARCH_HAS_NMI_WATCHDOG is a macro defined by arch, but config
HARDLOCKUP_DETECTOR depends on it. This is wrong, ARCH_HAS_NMI_WATCHDOG
has to be a Kconfig config, and arch's need it should select it
explicitly.
Signed-off-by: WANG Cong <[email protected]>
Acked-by: Don Zickus <[email protected]>
Acked-by: Mike Frysinger <[email protected]>
Cc: David Howells <[email protected]>
Cc: David Miller <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
on watchdog_thresh
Before the conversion of the NMI watchdog to perf event, the
watchdog timeout was 5 seconds. Now it is 60 seconds. For my
particular application, netbooks, 5 seconds was a better
timeout. With a short timeout, we catch faults earlier and are
able to send back a panic. With a 60 second timeout, the user is
unlikely to wait and will instead hit the power button, causing
us to lose the panic info.
This change configures the NMI period to watchdog_thresh and
sets the softlockup_thresh to watchdog_thresh * 2. In addition,
watchdog_thresh was reduced to 10 seconds as suggested by Ingo
Molnar.
Signed-off-by: Mandeep Singh Baines <[email protected]>
Cc: Marcin Slusarz <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
LKML-Reference: <[email protected]>
|
|
This restores the previous behavior of softlock_thresh.
Currently, setting watchdog_thresh to zero causes the watchdog
kthreads to consume a lot of CPU.
In addition, the logic of proc_dowatchdog_thresh and
proc_dowatchdog_enabled has been factored into proc_dowatchdog.
Signed-off-by: Mandeep Singh Baines <[email protected]>
Cc: Marcin Slusarz <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
LKML-Reference: <[email protected]>
|
|
CONFIG_HARDLOCKUP_DETECTOR
The x86 arch has shifted its use of the nmi_watchdog from a
local implementation to the global one provide by
kernel/watchdog.c. This shift has caused a whole bunch of
compile problems under different config options. I attempt to
simplify things with the patch below.
In order to simplify things, I had to come to terms with the
meaning of two terms ARCH_HAS_NMI_WATCHDOG and
CONFIG_HARDLOCKUP_DETECTOR. Basically they mean the same thing,
the former on a local level and the latter on a global level.
With the old x86 nmi watchdog gone, there is no need to rely on
defining the ARCH_HAS_NMI_WATCHDOG variable because it doesn't
make sense any more. x86 will now use the global
implementation.
The changes below do a few things. First it changes the few
places that relied on ARCH_HAS_NMI_WATCHDOG to use
CONFIG_X86_LOCAL_APIC (the former was an alias for the latter
anyway, so nothing unusual here). Those pieces of code were
relying more on local apic functionality the nmi watchdog
functionality, so the change should make sense.
Second, I removed the x86 implementation of
touch_nmi_watchdog(). It isn't need now, instead x86 will rely
on kernel/watchdog.c's implementation.
Third, I removed the #define ARCH_HAS_NMI_WATCHDOG itself from
x86. And tweaked the include/linux/nmi.h file to tell users to
look for an externally defined touch_nmi_watchdog in the case of
ARCH_HAS_NMI_WATCHDOG _or_ CONFIG_HARDLOCKUP_DETECTOR. This
changes removes some of the ugliness in that file.
Finally, I added a Kconfig dependency for
CONFIG_HARDLOCKUP_DETECTOR that said you can't have
ARCH_HAS_NMI_WATCHDOG _and_ CONFIG_HARDLOCKUP_DETECTOR. You can
only have one nmi_watchdog.
Tested with
ARCH=i386: allnoconfig, defconfig, allyesconfig, (various broken
configs) ARCH=x86_64: allnoconfig, defconfig, allyesconfig,
(various broken configs)
Hopefully, after this patch I won't get any more compile broken
emails. :-)
v3:
changed a couple of 'linux/nmi.h' -> 'asm/nmi.h' to pick-up correct function
prototypes when CONFIG_HARDLOCKUP_DETECTOR is not set.
Signed-off-by: Don Zickus <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
My patch that removed the old x86 nmi watchdog broke other
arches. This change reverts a piece of that patch and puts the
change in the correct spot.
Signed-off-by: Don Zickus <[email protected]>
Reviewed-by: Cyrill Gorcunov <[email protected]>
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Now that the bulk of the old nmi_watchdog is gone, remove all
the stub variables and hooks associated with it.
This touches lots of files mainly because of how the io_apic
nmi_watchdog was implemented. Now that the io_apic nmi_watchdog
is forever gone, remove all its fingers.
Most of this code was not being exercised by virtue of
nmi_watchdog != NMI_IO_APIC, so there shouldn't be anything to
risky here.
Signed-off-by: Don Zickus <[email protected]>
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Now that we have a new nmi_watchdog that is more generic and
sits on top of the perf subsystem, we really do not need the old
nmi_watchdog any more.
In addition, the old nmi_watchdog doesn't really work if you are
using the default clocksource, hpet. The old nmi_watchdog code
relied on local apic interrupts to determine if the cpu is still
alive. With hpet as the clocksource, these interrupts don't
increment any more and the old nmi_watchdog triggers false
postives.
This piece removes the old nmi_watchdog code and stubs out any
variables and functions calls. The stubs are the same ones used
by the new nmi_watchdog code, so it should be well tested.
Signed-off-by: Don Zickus <[email protected]>
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Combining the softlockup and hardlockup code causes watchdog.c
to build even without the hardlockup detection support.
So if an arch, that has the previous and the new nmi watchdog
implementations cohabiting, wants to know if the generic one
is in use, CONFIG_LOCKUP_DETECTOR is not a reliable check.
We need to use CONFIG_HARDLOCKUP_DETECTOR instead.
Fixes:
kernel/built-in.o: In function `touch_nmi_watchdog':
(.text+0x449bc): multiple definition of `touch_nmi_watchdog'
arch/sparc/kernel/built-in.o:(.text+0x11b28): first defined here
Signed-off-by: Don Zickus <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
LKML-Reference: <[email protected]>
[ use CONFIG_HARDLOCKUP_DETECTOR instead of CONFIG_PERF_EVENTS_NMI]
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
The new nmi_watchdog (which uses the perf event subsystem) is very
similar in structure to the softlockup detector. Using Ingo's
suggestion, I combined the two functionalities into one file:
kernel/watchdog.c.
Now both the nmi_watchdog (or hardlockup detector) and softlockup
detector sit on top of the perf event subsystem, which is run every
60 seconds or so to see if there are any lockups.
To detect hardlockups, cpus not responding to interrupts, I
implemented an hrtimer that runs 5 times for every perf event
overflow event. If that stops counting on a cpu, then the cpu is
most likely in trouble.
To detect softlockups, tasks not yielding to the scheduler, I used the
previous kthread idea that now gets kicked every time the hrtimer fires.
If the kthread isn't being scheduled neither is anyone else and the
warning is printed to the console.
I tested this on x86_64 and both the softlockup and hardlockup paths
work.
V2:
- cleaned up the Kconfig and softlockup combination
- surrounded hardlockup cases with #ifdef CONFIG_PERF_EVENTS_NMI
- seperated out the softlockup case from perf event subsystem
- re-arranged the enabling/disabling nmi watchdog from proc space
- added cpumasks for hardlockup failure cases
- removed fallback to soft events if no PMU exists for hard events
V3:
- comment cleanups
- drop support for older softlockup code
- per_cpu cleanups
- completely remove software clock base hardlockup detector
- use per_cpu masking on hard/soft lockup detection
- #ifdef cleanups
- rename config option NMI_WATCHDOG to LOCKUP_DETECTOR
- documentation additions
V4:
- documentation fixes
- convert per_cpu to __get_cpu_var
- powerpc compile fixes
V5:
- split apart warn flags for hard and soft lockups
TODO:
- figure out how to make an arch-agnostic clock2cycles call
(if possible) to feed into perf events as a sample period
[fweisbec: merged conflict patch]
Signed-off-by: Don Zickus <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Eric Paris <[email protected]>
Cc: Randy Dunlap <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
Mostly copy/paste whitespace damage with a couple of nitpicks by
the checkpatch script. Fix the struct definition as requested by Ingo too.
Signed-off-by: Don Zickus <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
--
arch/x86/kernel/apic/hw_nmi.c | 14 +++++------
arch/x86/kernel/traps.c | 6 ++--
include/linux/nmi.h | 2 -
kernel/nmi_watchdog.c | 51 ++++++++++++++++++++----------------------
4 files changed, 36 insertions(+), 37 deletions(-)
|
|
The original patch was x86_64 centric. Changed the code to make
it less so.
ested by building and running on a powerpc.
Signed-off-by: Don Zickus <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
These are the bits that enable the new nmi_watchdog and safely
isolate the old nmi_watchdog. Only one or the other can run,
not both at the same time.
Signed-off-by: Don Zickus <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
As Andrew noted, my previous patch ("debug lockups: Improve lockup
detection") broke/removed SysRq-L support from architecture that do
not provide a __trigger_all_cpu_backtrace implementation.
Restore a fallback path and clean up the SysRq-L machinery a bit:
- Rename the arch method to arch_trigger_all_cpu_backtrace()
- Simplify the define
- Document the method a bit - in the hope of more architectures
adding support for it.
[ The patch touches Sparc code for the rename. ]
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: "David S. Miller" <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
During kernel bootup, a new T60 laptop (CoreDuo, 32-bit) hangs about
10%-20% of the time in acpi_init():
Calling initcall 0xc055ce1a: topology_init+0x0/0x2f()
Calling initcall 0xc055d75e: mtrr_init_finialize+0x0/0x2c()
Calling initcall 0xc05664f3: param_sysfs_init+0x0/0x175()
Calling initcall 0xc014cb65: pm_sysrq_init+0x0/0x17()
Calling initcall 0xc0569f99: init_bio+0x0/0xf4()
Calling initcall 0xc056b865: genhd_device_init+0x0/0x50()
Calling initcall 0xc056c4bd: fbmem_init+0x0/0x87()
Calling initcall 0xc056dd74: acpi_init+0x0/0x1ee()
It's a hard hang that not even an NMI could punch through! Frustratingly,
adding printks or function tracing to the ACPI code made the hangs go away
...
After some time an additional detail emerged: disabling the NMI watchdog
made these occasional hangs go away.
So i spent the better part of today trying to debug this and trying out
various theories when i finally found the likely reason for the hang: if
acpi_ns_initialize_devices() executes an _INI AML method and an NMI
happens to hit that AML execution in the wrong moment, the machine would
hang. (my theory is that this must be some sort of chipset setup method
doing stores to chipset mmio registers?)
Unfortunately given the characteristics of the hang it was sheer
impossible to figure out which of the numerous AML methods is impacted
by this problem.
As a workaround i wrote an interface to disable chipset-based NMIs while
executing _INI sections - and indeed this fixed the hang. I did a
boot-loop of 100 separate reboots and none hung - while without the patch
it would hang every 5-10 attempts. Out of caution i did not touch the
nmi_watchdog=2 case (it's not related to the chipset anyway and didnt
hang).
I implemented this for both x86_64 and i686, tested the i686 laptop both
with nmi_watchdog=1 [which triggered the hangs] and nmi_watchdog=2, and
tested an Athlon64 box with the 64-bit kernel as well. Everything builds
and works with the patch applied.
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Len Brown <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When a spinlock lockup occurs, arrange for the NMI code to emit an all-cpu
backtrace, so we get to see which CPU is holding the lock, and where.
Cc: Andi Kleen <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Badari Pulavarty <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
|
|
touch_nmi_watchdog() calls touch_softlockup_watchdog() on both
architectures that implement it (i386 and x86_64). On other architectures
it does nothing at all. touch_nmi_watchdog() should imply
touch_softlockup_watchdog() on all architectures. Suggested by Andi Kleen.
[[email protected]: s390 fix]
Signed-off-by: Michal Schmidt <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Signed-off-by: Heiko Carstens <[email protected]>
Cc: Michal Schmidt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!
|