SBD Operation Guidelines for HAE Clusters
This document (7011346) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 12
SUSE Linux Enterprise Server 11 (SLES)
Split Brain Detection (SBD)
Situation
There are several variables associated with SBD funcationality. The variable and how to determine its value are shown below.
TOTEM Token (Default=5000): The time spent detecting a failure of a processor.
# cat /etc/corosync/corosync.conf
<snip>
totem {
version: 2
token: 5000
</snip>
Watchdog Timeout (Default=5): Time interval where at least one response from the sbd device has to be received.
Message Wait Timeout (Default=10): Specifies the time delay incurred when another node sends the poison pill.
# /usr/sbin/sbd -d /dev/sdb1 dump ==Dumping header on disk /dev/sdb1 Header version : 2 Number of slots : 255 Sector size : 512 Timeout (watchdog) : 5 Timeout (allocate) : 2 Timeout (loop) : 1 Timeout (msgwait) : 10 ==Header on disk /dev/sdb1 is dumped
STONITH Timeout (Default=60): How long to wait for the STONITH action to complete. If the nvpair xml tag for the stonith-timeout is missing, the default of 60 seconds is assumed.
# /usr/sbin/cibadmin -Q
<snip>
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.6-b988976485d15cb702c9307df55512d323831a5e"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>
<nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
<nvpair id="cib-bootstrap-options-stonith-timeout" name="stonith-timeout" value="120"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1352238282"/>
</cluster_property_set>
</crm_config>
</snip>
Resolution
1. Watchdog < Message Wait < STONITH Timeout
2. Message Wait = 2 x Watchdog
3. STONITH Timeout >= Message Wait + (Message Wait / 100 * 20)
4. TOTEM Token >= 5 seconds
5. STONITH Timeout < 300 seconds
6. Watchdog <= 120 seconds
Common Recommendations
Variable | Default | Suggestion 1 | Suggestion 2 |
Watchdog | 5 | 20 | 30 |
Message Wait | 10 | 40 | 60 |
STONITH Timeout | 60 | 90 | 120 |
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7011346
- Creation Date: 12-Nov-2012
- Modified Date:21-Oct-2021
-
- SUSE Linux Enterprise High Availability Extension
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com