How to safely change sbd timeout settings in a running pacemaker cluster
This document (7023689) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise High Availability Extension 12
Situation
For various potential reasons, the timeout settings for the configured sbd device(s) may need to be adjusted. For example, the timeout settings for watchdog (90) and msgwait (180) should be adjusted:
sles12cluster2:~ # sbd -d /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe dump
==Dumping header on disk /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe
Header version : 2.1
UUID : 62caa488-cbee-4449-84c3-5fd0659dcc09
Number of slots : 255
Sector size : 512
Timeout (watchdog) : 90
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 180
==Header on disk /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe is dumped
Resolution
- The following commands need to be executed as root user or user with equivalent permissions.
- Make sure none of the cluster resources are in stopped state before putting the cluster in maintenance mode.
- Make sure cluster will be stopped and restarted as described below, otherwise new settings will not be activated on the cluster nodes.
- Verify the sbd service was successfully stopped, check the output of: systemctl status sbd.
- In case existing sbd devices are exchanges with new ones, keep in mind to update /etc/sysconfig/sbd accordingly.
1. Run the following command to display the current settings of the sbd device:
# sbd -d <device> dump
2. Put the cluster into maintenance mode:
3. Verify if all cluster resources in "unmanged" state:
4. Stop the cluster services on all nodes:
5. Recreate the metadata on the sbd device(s):
Full example (using three sbd disks):
# sbd -d /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe -d /dev/disk/by-id/scsi-36000c29d7b18a8c4a6e980da7fd74fab -d /dev/disk/by-id/scsi-36000c2912306cd2a42adc9c0c95f450c -4 20 -1 10 create
6. Start the cluster services on all nodes:
# sbd -d <device> list
8. Put the cluster back to normal mode:
Additional Information
sles12cluster2:~ # sbd -d /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe dump ==Dumping header on disk /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe Header version : 2.1 UUID : f2faed5e-c0a5-46a8-8fb8-45d7bab44182 Number of slots : 255 Sector size : 512 Timeout (watchdog) : 10 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 20 ==Header on disk /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe is dumped sles12cluster2:~ # sbd -d /dev/disk/by-id/scsi-36000c29c0348eb3640b99be0f96e80fe list 0 sles12cluster2 clear 1 sles12cluster1 clear
Output systool -vc watchdog after the change (12 SP4 and later):
systool -vc watchdog Class = "watchdog" Class Device = "watchdog0" Class Device path = "/sys/devices/virtual/watchdog/watchdog0" bootstatus = "0" dev = "249:0" identity = "Software Watchdog" nowayout = "0" pretimeout = "0" pretimeout_available_governors= "noop" pretimeout_governor = "noop" state = "active" status = "0x8000" timeout = "10" uevent = "MAJOR=249 MINOR=0 DEVNAME=watchdog0"
For more information please refer to:
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7023689
- Creation Date: 30-Jan-2019
- Modified Date:20-Feb-2024
-
- SUSE Linux Enterprise High Availability Extension
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com