Settings for long timeout in SBD_DELAY_START
This document (7023572) is provided subject to the disclaimer at the end of this document.
Environment
Situation
Issue number one can be that the SBD service will timeout during start, as the SBD_DELAY_START might take longer than the default for system services in systemd.
Issue number two can be that the on return the returning node starts corosync and by this blocks the cluster. The symptom looks like everything from a cluster perspective worked, for example fencing. But then the "surviving node waited until the fenced node returned"
The logs show entries similar to
Dec 03 15:29:25 [3533] animal pengine: notice: LogActions: Start fs_mysap (animal - blocked)
Resolution
cp /usr/lib/systemd/system/sbd.service /etc/systemd/system/sbd.service
and edit
/etc/systemd/system/sbd.service
and add in section
[Unit]
Before=corosync.service
and add in section
[Service]
TimeoutSec=600
so the files looks like
[Unit]
Description=Shared-storage based fencing daemon
Documentation=man:sbd(8)
Before=pacemaker.service
Before=dlm.service
Before=corosync.service
After=systemd-modules-load.service iscsi.service
PartOf=corosync.service
RefuseManualStop=true
RefuseManualStart=true
[Service]
Type=forking
PIDFile=/var/run/sbd.pid
EnvironmentFile=-/etc/sysconfig/sbd
ExecStart=/usr/sbin/sbd $SBD_OPTS -p /var/run/sbd.pid watch
ExecStop=/usr/bin/kill -TERM $MAINPID
TimeoutSec=600
# Could this benefit from exit codes for restart?
# Does this need to be set to msgwait * 1.2?
# TimeoutSec=
# If SBD crashes, it'll very likely suicide immediately due to the
# hardware watchdog. But one can always try.
Restart=on-abort
[Install]
RequiredBy=corosync.service
RequiredBy=pacemaker.service
RequiredBy=dlm.service
and then issue
systemctl daemon-reload
Cause
Issue number two is caused by starting the corosync service on the returning node before waiting for the SBD timeout
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7023572
- Creation Date: 10-Dec-2018
- Modified Date:03-Mar-2020
-
- SUSE Linux Enterprise High Availability Extension
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com