iTCO_wdt does not accept Watchdog Timeout bigger 63 seconds
This document (7011426) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise High Availability Extension 11 Service Pack 1
SUSE Linux Enterprise High Availability Extension 11 Service Pack 2
SUSE Linux Enterprise Server 10 Service Pack 4
SUSE Linux Enterprise High Availability Extension 11
Situation
jupiter:~ # sbd -d /dev/disk/by-id/scsi-mywatchdogdevice dump
==Dumping header on disk /dev/disk/by-id/scsi-mywatchdogdevice
Header version : 2
Number of slots : 255
Sector size : 512
Timeout (watchdog) : 65
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 130
But the logfiles show entries like this on cluster start
Nov 22 12:33:32 jupiter sbd: [3845]: ERROR: WDIOC_SETTIMEOUT: Failed to set watchdog timer to 65 seconds.: Invalid argument
Nov 22 12:33:32 jupiter sbd: [3845]: CRIT: Please validate your watchdog configuration!
Nov 22 12:33:32 jupiter sbd: [3845]: CRIT: Choose a different watchdog driver or specify -T to silence this check if you are sure.
which indicates that the value of the Watchdog Timeout could not be passed to the Hardware Watchdog. This also means that the cluster is not protected via the Watchdog and this is, as the log file states, a critical issue. It should be resolved first and as soon as possible.
This applies to all Watchog Timeout Values bigger than 63 Seconds and the iTCO_wdt Hardware Watchdog
Resolution
sbd -d /dev/disk/by-id/scsi-mywatchdogdevice -1 60 -4 120 create
and a restart of all cluster nodes would resolve the issue.
If it is really necessary to use a Watchdog Timeout of >63 Seconds then it should be checked whether there is any other Hardware Watchdog available.
As a last resort softdog could be used.
Cause
if (((iTCO_wdt_private.iTCO_version == 2) && (tmrval > 0x3ff)) ||
((iTCO_wdt_private.iTCO_version == 1) && (tmrval > 0x03f)))
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7011426
- Creation Date: 27-Nov-2012
- Modified Date:03-Mar-2020
-
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com