SBD STONITH fails to fence other node when using the fully qualified DNS name.
This document (000019877) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise High Availability Extension 15 SP2
SUSE Linux Enterprise High Availability Extension 12 SP4
SUSE Linux Enterprise High Availability Extension 12 SP5
Situation
Errors from /var/log/messages from <node1>
node1 stonith-ng[3008]: notice: Couldn't find anyone to fence (reboot) node2.example.com with any device node1 stonith-ng[3008]: error: Operation reboot of node2.example.com by <no-one> for crmd.3012@node2.example.com: No such device node1 crmd[3012]: notice: Stonith operation 2/1:0:0:edac53d5-64ec-4650-a447-5aa2a5fc004a: No such device (-19) node1 crmd[3012]: notice: Stonith operation 2 for node2.example.com failed (No such device): aborting transition. node1 crmd[3012]: warning: No devices found in cluster to fence node2.example.com, giving up
Resolution
Preferred Solution:
Modify the /etc/corosync/corosync.conf to use the DNS short name or IP Address under the
nodelist --> node --> ring0_addr:
If the short name is used rather than IP Address, it's also recommended to add entry in /etc/hosts to eliminate dependency on an external DNS server.
Example:
nodelist { node { ring0_addr: node1 nodeid: 1 } node { ring0_addr: node2 nodeid: 2 }
Although FQDN (Fully Qualified Domain Names) is supported by Pacemaker, Corosync and SBD, it will require additional configuration outlined in Optional Solution below.
Optional Solution:
If using FQDN for ring0_addr in the /etc/corosync/corosync.conf, then follow these steps.
- Remove the /etc/sysconfig/sbd from the /etc/csync2/csync2.cfg so it does not get synchronized across cluster nodes. This file will need to be managed outside of csync2 as it will be different on each node.
- Use the "-n node" option for SBD. Reference: man sbd (8)
Set the "SBD_OPTS=-n <FQDN of node2>" in /etc/sysconfig/sbd on second node.
Cause
Additional Information
The name Pacemaker uses is:
1) The value stored in corosync.conf under ring0_addr in the nodelist, if it does not contain an IP address; otherwise
2) The value stored in corosync.conf under name in the nodelist; otherwise
3) The value of uname -n
Beware: If `uname -n` does not match the name of the node in the cluster configuration, you will need to pass the advertised name to SBD with the`-n` option.
Example: SBD_OPTS="-n <FQDN host name>"
The sbd(8) manual page explains how SBD works in this regard:
-n node
Set local node name; defaults to "uname -n". This should not need to be set.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000019877
- Creation Date: 11-Feb-2021
- Modified Date:12-Feb-2021
-
- SUSE Linux Enterprise High Availability Extension
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com