SUSE Support

Here When You Need Us

Fencing failing with No fence device

This document (000021466) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 15
SUSE Linux Enterprise Server for SAP Applications 15 

Situation

Two node clusters configured with three SBD disks. The Environment looks like below: 
  • Site A : Node1 + SBD1 
  • Site B : Node2 + SBD2 
  • Site C : SBD3 
Site A had power-outage and connectivity was broken, fencing Node1 via Node2 was failing with: No fence device error message even though Node2 can still access SBD2 and SBD3. 

The log contains messages similar to below: 
May 07 14:24:35 aaha02 sbd[9252]:  warning: open_device: Opening device /dev/disk/by-id/scsi-360014050a4630178a014c718e83739cd failed.
May 07 14:24:35 aaha02 external/sbd(stonith-sbd)[9455]: ERROR: sbd list failed: == disk /dev/disk/by-id/scsi-360014050a4630178a014c718e83739cd unreadable!
May 07 14:24:36 aaha02 stonith[9091]: external_status: 'sbd status' failed with rc 1
May 07 14:24:36 aaha02 stonith[9091]: external/sbd device not accessible.
May 07 14:24:36 aaha02 pacemaker-fenced[5887]:  warning: fence_legacy[9058] stderr: [ == disk /dev/disk/by-id/scsi-360014050a4630178a014c718e83739cd unreadable! ]
May 07 14:24:36 aaha02 pacemaker-fenced[5887]:  warning: fence_legacy[9058] stderr: [ ==Header on disk /dev/disk/by-id/scsi-360014050a4630178a014c718e83739cd NOT dumped ]
May 07 14:24:36 aaha02 pacemaker-fenced[5887]:  warning: fence_legacy[9058] stderr: [ sbd failed; please check the logs. ]
May 07 14:24:36 aaha02 pacemaker-fenced[5887]:  warning: fence_legacy[9058] stderr: [ logd is not running ]
May 07 14:24:36 aaha02 pacemaker-fenced[5887]:  notice: Couldn't find anyone to fence (reboot) aaha01 using any device
May 07 14:24:36 aaha02 pacemaker-fenced[5887]:  error: Operation 'reboot' targeting aaha01 by unknown node for pacemaker-controld.5891@aaha02: Error occurred (No fence device)
May 07 14:24:36 aaha02 pacemaker-controld[5891]:  warning: Fence operation 3 for aaha01 failed: No fence device (aborting transition and giving up for now)
May 07 14:24:36 aaha02 pacemaker-controld[5891]:  notice: Transition 3 aborted: Stonith failed
May 07 14:24:36 aaha02 pacemaker-controld[5891]:  notice: Peer aaha01 was not terminated (reboot) by the cluster on behalf of pacemaker-controld.5891@aaha02: No fence device

 

Resolution

To fix the issue, the stonith-timeout should be larger than the time needed by a subsystem to check the availability of the SBD devices. 

In case SBD devices coming from iSCSI, the stonith-timeout calculation need to be larger than: 
 
SCSI timeout + sbd msgwait + pcmk_delay_max + 20% wiggle room

Cause

This issue happens if the cluster stonith-timeout is very low. In other words, the core of the issue is that the fence system is checking on the SBD devices first, but this check takes more time than cluster stonith-timeout value.

For example if the SBD devices come from iSCSI storage, the default timeout is 120 Seconds. 

The 120 Seconds are a result of 15 seconds login timeout and 8 tries.
 
node.conn[0].timeo.login_timeout = 15
node.session.initial_login_retry_max = 8

Additional Information

For more information, kindly check: 

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:000021466
  • Creation Date: 13-Jun-2024
  • Modified Date:21-Jun-2024
    • SUSE Linux Enterprise High Availability Extension
    • SUSE Linux Enterprise Server for SAP Applications

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.