SUSE Support

Here When You Need Us

SBD Operation Guidelines for HAE Clusters

This document (7011346) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 15
SUSE Linux Enterprise Server 15
SUSE Linux Enterprise High Availability Extension 12
SUSE Linux Enterprise Server 12
SUSE Linux Enterprise High Availability Extension 11 (HAE)
SUSE Linux Enterprise Server 11 (SLES)
Split Brain Detection (SBD)

Situation

SBD operations fail or do not work in a timely manner. There are a few factors involved with proper STONITH SBD functionality in an HAE cluster. This TID is intended for a brief set of guidelines. See TID7009485 - SBD setup - debug and verify (OPENAIS) , TID7023689  - How to safely change sbd timeout settings , and http://linux-ha.org/wiki/SBD_Fencing for additional details.

There are several variables associated with SBD funcationality. The variable and how to determine its value are shown below.

TOTEM Token (Default=5000): The time spent detecting a failure of a processor.
# cat /etc/corosync/corosync.conf
<snip>
totem {
    version:    2
    token:      5000
</snip>

Watchdog Timeout (Default=5): Time interval where at least one response from the sbd device has to be received.

Message Wait Timeout (Default=10): Specifies the time delay incurred when another node sends the poison pill.
# /usr/sbin/sbd -d /dev/sdb1 dump
==Dumping header on disk /dev/sdb1
Header version     : 2
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 5
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 10
==Header on disk /dev/sdb1 is dumped

STONITH Timeout (Default=60): How long to wait for the STONITH action to complete. If the nvpair xml tag for the stonith-timeout is missing, the default of 60 seconds is assumed.
# /usr/sbin/cibadmin -Q
<snip>
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.6-b988976485d15cb702c9307df55512d323831a5e"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>
        <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
        <nvpair id="cib-bootstrap-options-stonith-timeout" name="stonith-timeout" value="120"/>
        <nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1352238282"/>
      </cluster_property_set>
    </crm_config>
</snip>

Resolution

Principle guidelines for SBD functionality. These are guidelines. If you deviate from them, make sure you know what you are doing.
1. Watchdog < Message Wait < STONITH Timeout
2. Message Wait = 2 x Watchdog
3. STONITH Timeout >= Message Wait + (Message Wait / 100 * 20)
4. TOTEM Token >= 5 seconds
5. STONITH Timeout < 300 seconds
6. Watchdog <= 120 seconds

Common Recommendations
VariableDefaultSuggestion 1Suggestion 2
Watchdog52030
Message Wait104060
STONITH Timeout6090120

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7011346
  • Creation Date: 12-Nov-2012
  • Modified Date:21-Oct-2021
    • SUSE Linux Enterprise High Availability Extension
    • SUSE Linux Enterprise Server

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

tick icon

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

tick icon

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

tick icon

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.