SUSE Support

Here When You Need Us

Disk latency may cause unwanted node fencing

This document (7011350) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 15
SUSE Linux Enterprise High Availability Extension 12 
SUSE Linux Enterprise High Availability Extension 11 
 

Situation

Occasionally a node will reboot due to SBD self-fencing when it may not have been necessary. The following error messages appear in the system logs:
sbd: [18584]: WARN: Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
sbd: [18584]: WARN: Latency: No liveness for 5 s exceeds threshold of 3 s (healthy servants: 0)
sbd: [18584]: WARN: Latency: No liveness for 6 s exceeds threshold of 3 s (healthy servants: 0)
sbd: [18585]: WARN: Latency: 6 exceeded threshold 3 on disk /dev/disk/by-id/dm-uuid-mpath-3600508b40007015738922001340000

The sbd partition metadata shows the following:
# /usr/sbin/sbd -d /dev/sdb1 dump
==Dumping header on disk /dev/sdb1
Header version     : 2
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 5
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 10
==Header on disk /dev/sdb1 is dumped

Resolution

Check the health of the disks with SBD partitions and increase the watchdog timeout value and/or add SBD partitions. Remember the msgwait value should be about twice the watchdog value. It must be changed at the same time.
 
hn1:~ # cat /etc/sysconfig/sbd 
SBD_DEVICE="/dev/sdb1;/dev/sdc1;/dev/sdd1"
SBD_OPTS="-W"

hn1:~ # sbd -1 10 -4 20 -d /dev/sdb1 -d /dev/sdc1 -d /dev/sdd1 create
Initializing device /dev/sdb1
Creating version 2 header on device 3
Initializing 255 slots on device 3
Device /dev/sdb1 is initialized.
Initializing device /dev/sdc1
Creating version 2 header on device 3
Initializing 255 slots on device 3
Device /dev/sdc1 is initialized.
Initializing device /dev/sdd1
Creating version 2 header on device 3
Initializing 255 slots on device 3
Device /dev/sdd1 is initialized.

hn1:~ # sbd -d /dev/sdb1 dump
==Dumping header on disk /dev/sdb1
Header version     : 2
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 10
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 20
==Header on disk /dev/sdb1 is dumped

Cause

SBD will self-fence the node if it can't read from the device for longer than the watchdog timeout; which defaults to 5 seconds. This is key, since sbd (as a sender) relies on the message either being delivered or the node having self-fenced if the device is unreadable.

You can increase this [watchdog] to 10 or even 20s (you need to recreate the sbd device for that, the timeouts are configured at creation time), but take care to adjust the msgwait timeout at the same time to approximately twice the watchdog timeout.

You can decrease the latency impact by adding SBD partitions. For example, if you have three SBD partitions, at least two of those devices would need to exceed the latency threshold before a self-fence would occur.

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7011350
  • Creation Date: 12-Nov-2012
  • Modified Date:24-Aug-2022
    • SUSE Linux Enterprise High Availability Extension

< Back to Support Search

For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com

tick icon

SUSE Support Forums

Get your questions answered by experienced Sys Ops or interact with other SUSE community experts.

tick icon

Support Resources

Learn how to get the most from the technical support you receive with your SUSE Subscription, Premium Support, Academic Program, or Partner Program.

tick icon

Open an Incident

Open an incident with SUSE Technical Support, manage your subscriptions, download patches, or manage user access.