aboutsummaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorDamian Muszynski <damian.muszynski@intel.com>2023-06-30 19:03:57 +0200
committerHerbert Xu <herbert@gondor.apana.org.au>2023-07-20 22:16:23 +1200
commit359b84f8db942ef46d24de8aa397790c3fae22e0 (patch)
tree656d4f21c24ca4405460ee2cb6be24e7efa81d32 /Documentation
parente2980ba57e797e58a5476fbc4296f40551fb3404 (diff)
crypto: qat - add heartbeat feature
Under some circumstances, firmware in the QAT devices could become unresponsive. The Heartbeat feature provides a mechanism to detect unresponsive devices. The QAT FW periodically writes to memory a set of counters that allow to detect the liveness of a device. This patch adds logic to enable the reporting of those counters, analyze them and report if a device is alive or not. In particular this adds (1) heartbeat enabling, reading and detection logic (2) reporting of heartbeat status and configuration via debugfs (3) documentation for the newly created sysfs entries (4) configuration of FW settings related to heartbeat, e.g. tick period (5) logic to convert time in ms (provided by the user) to clock ticks This patch introduces a new folder in debugfs called heartbeat with the following attributes: - status - queries_sent - queries_failed - config All attributes except config are reading only. In particular: - `status` file returns 0 when device is operational and -1 otherwise. - `queries_sent` returns the total number of heartbeat queries sent. - `queries_failed` returns the total number of heartbeat queries failed. - `config` allows to adjust the frequency at which the firmware writes counters to memory. This period is given in milliseconds and it is fixed for GEN4 devices. Signed-off-by: Damian Muszynski <damian.muszynski@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/debugfs-driver-qat51
1 files changed, 51 insertions, 0 deletions
diff --git a/Documentation/ABI/testing/debugfs-driver-qat b/Documentation/ABI/testing/debugfs-driver-qat
index 22d39c0ca1b2..6731ffacc5f0 100644
--- a/Documentation/ABI/testing/debugfs-driver-qat
+++ b/Documentation/ABI/testing/debugfs-driver-qat
@@ -8,3 +8,54 @@ Description: (RO) Read returns the number of requests sent to the FW and the num
<N>: Number of requests sent from Acceleration Engine N to FW and responses
Acceleration Engine N received from FW
+
+What: /sys/kernel/debug/qat_<device>_<BDF>/heartbeat/config
+Date: November 2023
+KernelVersion: 6.6
+Contact: qat-linux@intel.com
+Description: (RW) Read returns value of the Heartbeat update period.
+ Write to the file changes this period value.
+
+ This period should reflect planned polling interval of device
+ health status. High frequency Heartbeat monitoring wastes CPU cycles
+ but minimizes the customer’s system downtime. Also, if there are
+ large service requests that take some time to complete, high frequency
+ Heartbeat monitoring could result in false reports of unresponsiveness
+ and in those cases, period needs to be increased.
+
+ This parameter is effective only for c3xxx, c62x, dh895xcc devices.
+ 4xxx has this value internally fixed to 200ms.
+
+ Default value is set to 500. Minimal allowed value is 200.
+ All values are expressed in milliseconds.
+
+What: /sys/kernel/debug/qat_<device>_<BDF>/heartbeat/queries_failed
+Date: November 2023
+KernelVersion: 6.6
+Contact: qat-linux@intel.com
+Description: (RO) Read returns the number of times the device became unresponsive.
+
+ Attribute returns value of the counter which is incremented when
+ status query results negative.
+
+What: /sys/kernel/debug/qat_<device>_<BDF>/heartbeat/queries_sent
+Date: November 2023
+KernelVersion: 6.6
+Contact: qat-linux@intel.com
+Description: (RO) Read returns the number of times the control process checked
+ if the device is responsive.
+
+ Attribute returns value of the counter which is incremented on
+ every status query.
+
+What: /sys/kernel/debug/qat_<device>_<BDF>/heartbeat/status
+Date: November 2023
+KernelVersion: 6.6
+Contact: qat-linux@intel.com
+Description: (RO) Read returns the device health status.
+
+ Returns 0 when device is healthy or -1 when is unresponsive
+ or the query failed to send.
+
+ The driver does not monitor for Heartbeat. It is left for a user
+ to poll the status periodically.