SBD-based Pacemaker fails to fence and stops with fatal error
This document (000021694) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise High Availability Extension 15 (All Service Packs)
SUSE Linux Enterprise Server for SAP Applications 15 (All Service Packs)
Situation
A Pacemaker cluster configured with a SBD STONITH fencing device attempts to fence a node after a failure. The node with the failure does not reboot, but Pacemaker on the node terminates with 100 exit status
and does not recover on its own. Manual cluster recovery attempts in this state may lead to data loss if executed without a firm understanding of the cluster's degraded condition.
The following error can be observed in pacemaker.log of the failed node:
Jan 28 22:03:44.378 node2 pacemakerd [18641] (pcmk_shutdown_worker) notice: Shutting down and staying down after fatal errorJan 28 22:03:44.378 node2 pacemakerd [18641] (pcmkd_shutdown_corosync) info: Asking Corosync to shut down
Jan 28 22:03:44.394 node2 pacemakerd [18641] (crm_exit) info: Exiting pacemakerd | with status 100
Resolution
The sbd.service systemd unit must be enabled. It is started by pacemaker.service as a dependency during the Pacemaker startup process.
sbd.service cannot be started manually by an administrator with systemctl; This is by design.
To enable and start sbd.service, run the following two commands:
# systemctl enable sbd.service
# systemctl restart pacemaker.service
Cause
The sbd.service unit is disabled which puts the cluster is in an unsupported state.
Additional Information
One can check the Cluster Information dataBase (CIB) for confirmation. If sbd.service was started when Pacemaker started up, "have-watchdog=true" is set in the "property cib-bootstrap-options" block. If sbd.service was not started with pacemaker.service, this value is set to false.
To output the contents of the CIB:
# crm configure show
Example of misconfiguration:
property cib-bootstrap-options: \
have-watchdog=false \
dc-version="2.1.5+20221208.a3f44794f-150500.6.20.1-2.1.5+20221208.a3f44794f" \
cluster-infrastructure=corosync \
cluster-name=hacluster \
stonith-enabled=true \
stonith-timeout=246 \
priority-fencing-delay=60
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:000021694
- Creation Date: 05-Feb-2025
- Modified Date:18-Feb-2025
-
- SUSE Linux Enterprise High Availability Extension
- SUSE Linux Enterprise Server for SAP Applications
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com