All HAE nodes fail to start clustering after reboot
This document (7011302) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 11 (SLES)
Split Brain Detection (SBD) Partitions
Situation
Starting OpenAIS/Corosync daemon (corosync): Starting SBD - SBD failed to start; aborting.
Failed services in runlevel 3: openais
The /etc/sysconfig/sbd configuration file on all nodes shows:
SBD_DEVICE="/dev/sdc1;/dev/sdd1;/dev/sde1"The stonith resource in the /var/lib/heartbeat/crm/cib.xml shows:
SBD_OPTS="-W"
# /usr/sbin/cibadmin -Q
Signon to CIB failed: connection failed
Init failed, could not perform requested operations
<primitive class="stonith" id="stonith-sbd" type="external/sbd">
<instance_attributes id="stonith-sbd-instance_attributes">
<nvpair id="stonith-sbd-instance_attributes-sbd_device" name="sbd_device" value="/dev/sdb1;/dev/sdc1;/dev/sdd1"/>
</instance_attributes>
</primitive>
Resolution
Method 1 when CIB Database is Correct
For example, the stonith resource in the /var/lib/heartbeat/crm/cib.xml shows:
<primitive class="stonith" id="stonith-sbd" type="external/sbd">
<instance_attributes id="stonith-sbd-instance_attributes">
<nvpair id="stonith-sbd-instance_attributes-sbd_device" name="sbd_device" value="/dev/sdb1;/dev/sdc1;/dev/sdd1"/>
</instance_attributes>
</primitive>
1. On one node, modify the /etc/sysconfig/sbd file.
2. Change the SBD_DEVICE variable to match the CIB database.
SBD_DEVICE="/dev/sdb1;/dev/sdc1;/dev/sdd1"3. Save the copy the /etc/sysconfig/sbd file to all nodes in the cluster
SBD_OPTS="-W"
scp /etc/sysconfig/sbd node2:/etc/sysconfig/sbd
4. Recreate the sbd partitions as listed in the CIB database
sbd -d /dev/sdb1 -d /dev/sdc1 -d /dev/sdd1 create
Method 2 when /etc/sysconfig/sbd is Correct
The correct /etc/sysconfig/sbd shows:
SBD_DEVICE="/dev/sdc1;/dev/sdd1;/dev/sde1"
SBD_OPTS="-W"
1. Rename the /etc/sysconfig/sbd file to /etc/sysconfig/sbd.save on all nodes in the cluster.
mv /etc/sysconfig/sbd /etc/sysconfig/sbd.save
2. Reboot all nodes in the cluster
3. Remove the stonith resource parameter list or add the correct sbd_device list.
Assuming stonith resource name of stonith-sbd:
crm_resource --delete --resource stonith-sbd --resource-type primitive
crm configure primitive stonith_sbd stonith:external/sbd params sbd_device="/dev/sdc1;/dev/sdd1;/dev/sde1"
-OR-
crm configure primitive stonith_sbd stonith:external/sbd
4. Rename the /etc/sysconfig/sbd.save back to /etc/sysconfig/sbd on all nodes in the cluster
mv /etc/sysconfig/sbd.save /etc/sysconfig/sbd
5. Reboot all nodes in the cluster
Cause
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7011302
- Creation Date: 02-Nov-2012
- Modified Date:03-Mar-2020
-
- SUSE Linux Enterprise High Availability Extension
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com