How to load the correct watchdog kernel module
This document (7016880) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server for SAP Applications 11 SP4
SUSE Linux Enterprise Server 11 Service Pack 3 (SLES 11 SP3)
SUSE Linux Enterprise Server 11 Service Pack 4 (SLES 11 SP4)
SUSE Linux Enterprise High Availability Extension 11 Service Pack 3
SUSE Linux Enterprise High Availability Extension 11 Service Pack 4
Situation
Unfortunately automatic probing of the right watchdog kernel module fails sometimes. Manual configuration of the correct module is necessary.
Resolution
Finding out the right kernel module for a given system is not exactly trivial. This causes automatic probing to fail very often. As result, lots of modules are already loaded before the right one gets a chance.
A proven solution is to load the proper watchdog driver very early during system boot, before the auto-probing takes place.
In order to enable loading of the right watchdog kernel module on boot, the following steps could be performed:
- The right watchdog module has to be determined.
- Any wrong watchdog module has to be unloaded.
- The right watchdog module has to be loaded.
- In order to automate loading of the right watchdog module,it has to be included into the boot process.
- The watchdog has to be tested.
Please note, that only one software must access the watchdog timer. Some hardware vendors ship systems management software that use the watchdog for system resets. Such software has to be disabled if the watchdog needs to be used by the SBD that comes with SLE-HA.
Implementation
Step 1. The right watchdog module has to be determined.
Currently there is no stable programmatic approach to determine the right watchdog kernel module in any case.
- On HP hardware the "hpwdt" module should work.
- For systems with an Intel TCO "iTCO_wdt" can be used.
Dell, Fujitsu, and Lenovo usually fall into this category. - Inside a VM on z/VM on an IBM mainframe "vmwatchdog" might be used.
- Inside a Xen VM (aka DomU) "xen_wdt" is a good choice.
- "softdog" is the most generic driver, but it is recommended that you use one with actual hardware integration.
See /lib/modules/.../kernel/drivers/watchdog in the kernel package for a list of choices. Of course, the hardware vendor should be able to name the right watchdog. A test as described in step 5 below shows, if a loaded module works.
Step 2. Any wrong watchdog module has to be unloaded.
List loaded watchdog module:
# lsmod | grep -e dog -e wdt
Note: If any module is used, that does not contain the shown strings "dog" or "wdt", it will not show up. See /lib/modules/.../kernel/drivers/watchdog for modules shipped with the SLES operating system.
Unload non-needed watchdog modules:
# rmmod <wrong_module>
Repeat the above for all non-needed modules.
Step 3. The right watchdog module has to be loaded.
The right module is loaded:
# modprobe softdog
Note: We use the "softdog" module here as an example. The right module has to be determined following step 1.
Step 4. Automate module loading
In order to automate loading of the right watchdog module, it has to be included into the boot process.
The right watchdog kernel module is added to the initrd. This could be achieved by appending the basename of the right module to the content of the INITRD_MODULES variable:
# vi /etc/sysconfig/kernel
INITRD_MODULES=" ... softdog"
# mkinitrd
Note: We use the "softdog" module here as an example. The right module has to be determined following step 1.
Step 5. The watchdog has to be tested.
Check, if the module was added to initrd:
# zcat /boot/initrd | cpio -it 2>/dev/null | grep -e wdt -e dog
Note: If any module is used, that does not contain the shown strings "dog" or "wdt", it will not show up.
Check, if the watchdog module was loaded:
# lsmod | grep -e wdt -e dog
Note: If any module is used, that does not contain the shown strings "dog" or "wdt", it will not show up.
Check, if the watchdog module has created a device file for communication:
# ls -l /dev/watchdog
Check, if the watchdog already is used by a process. Only one process should use the watchdog:
# lsof /dev/watchdog
Check, if the freshly loaded watchdog module works. This test will force a hard reboot of the system. Usually it will take up to 60 seconds, until the system gets a hard reboot. File systems will get unclean. Do not do this on a productive system:
# echo "do not do this on a productive system:"
# echo "cat /dev/watchdog"
Online documentation
https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_storage_protect_fencing.html
http://linux-ha.org/wiki/SBD_Fencing
http://www.clusterlabs.org/doc/crm_fencing.html
http://code.metager.de/source/xref/linux/stable/drivers/s390/char/vmwatchdog.c
Kernel documentation
/usr/src/linux/Documentation/watchdog/watchdog-api.txt
Technical Information Documents
SBD Operation Guidelines for HAE Clusters
https://www.suse.com/support/kb/doc.php?id=7011346
SBD setup - debug and verify (OPENAIS)
https://www.suse.com/support/kb/doc.php?id=7009485
iTCO_wdt does not accept Watchdog Timeout bigger 63 seconds
https://www.suse.com/support/kb/doc.php?id=7011426
Manual pages
lsof(8)
mkinitrd(8)
modinfo(8)
modprobe(8)
SuSEconfig(8)
sbd(8)
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7016880
- Creation Date: 05-Oct-2015
- Modified Date:03-Mar-2020
-
- SUSE Linux Enterprise High Availability Extension
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com