diff options
author | Tony Luck <tony.luck@intel.com> | 2023-11-15 11:54:49 -0800 |
---|---|---|
committer | Borislav Petkov (AMD) <bp@alien8.de> | 2023-12-15 14:52:01 +0100 |
commit | 7eae17c4add5de46efcca45356388f480103e6d9 (patch) | |
tree | c69da43d82da16e00cc0993136490b648f6742fe /scripts/gcc-plugins/sancov_plugin.c | |
parent | 3ed57b41a4125609e9fd03e32228aec61d95fe1f (diff) |
x86/mce: Add per-bank CMCI storm mitigation
This is the core functionality to track CMCI storms at the machine check
bank granularity. Subsequent patches will add the vendor specific hooks
to supply input to the storm detection and take actions on the start/end
of a storm.
machine_check_poll() is called both by the CMCI interrupt code, and for
periodic polls from a timer. Add a hook in this routine to maintain
a bitmap history for each bank showing whether the bank logged an
corrected error or not each time it is polled.
In normal operation the interval between polls of these banks determines
how far to shift the history. The 64 bit width corresponds to about one
second.
When a storm is observed a CPU vendor specific action is taken to reduce
or stop CMCI from the bank that is the source of the storm. The bank is
added to the bitmap of banks for this CPU to poll. The polling rate is
increased to once per second. During a storm each bit in the history
indicates the status of the bank each time it is polled. Thus the
history covers just over a minute.
Declare a storm for that bank if the number of corrected interrupts seen
in that history is above some threshold (defined as 5 in this series,
could be tuned later if there is data to suggest a better value).
A storm on a bank ends if enough consecutive polls of the bank show no
corrected errors (defined as 30, may also change). That calls the CPU
vendor specific function to revert to normal operational mode, and
changes the polling rate back to the default.
[ bp: Massage. ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231115195450.12963-3-tony.luck@intel.com
Diffstat (limited to 'scripts/gcc-plugins/sancov_plugin.c')
0 files changed, 0 insertions, 0 deletions