Fencing, crashes and hangs on system with Mulitpath and OCFS2
This document (7000097) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise Server 9 All Support Packs
Situation
- OCFS2 fences a node
- The cluster is unstable
- Hangs or crashes are observed
Resolution
The polling policy for MPIO is the time in seconds where a path is checked. The default setting is 5, however, in many situations this number is adjusted higher. The default OCFS2 heartbeat is set to 7. Under circumstances where MPIO settings have not been adjusted, modifications should not be need.
To fix this situation either the polling_interval in /etc/multipath.conf needs to be lower than the O2CB_HEARTBEAT_THRESHOLD in /etc/sysconfig/o2cb.
non-working configruation
The following is an example of a non-working configuration.
/etc/multipath.conf
udev_dir /dev
polling_interval 10
default_selector "round-robin 0"
default_path_grouping_policy multibus
rr_weight priorities
failback immediate
no_path_retry queue
}
devnode_blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z][[0-9]*]"
devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
device {
vendor DEC.*
product MSA[15]00
}
}
devices {
device {
vendor "COMPAQ"
product "MSA1000 VOLUME"
path_grouping_policy multibus
}
}
/etc/sysconfig/o2cb
O2CB_BOOTCLUSTER=HappyHippo
O2CB_HEARTBEAT_THRESHOLD=
working configruation
The following are examples of a working configuration.
/etc/multipath.conf
udev_dir /dev
polling_interval 10
default_selector "round-robin 0"
default_path_grouping_policy multibus
rr_weight priorities
failback immediate
no_path_retry queue
}
devnode_blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z][[0-9]*]"
devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
device {
vendor DEC.*
product MSA[15]00
}
}
devices {
device {
vendor "COMPAQ"
product "MSA1000 VOLUME"
path_grouping_policy multibus
}
}
/etc/sysconfig/o2cb
O2CB_BOOTCLUSTER=HappyHippo
O2CB_HEARTBEAT_THRESHOLD=14
adjusting the threshold:
The threshold for the O2CB_HEARTBEAT_THRESHOLD may need to be adjusted higher. Start by making the number higher than the polling_interval and then adjusting till the system seems stable. It is not uncommon for the threshold to be as high as 45 or 60 on heavily loaded systems.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7000097
- Creation Date: 15-Apr-2008
- Modified Date:16-Mar-2021
-
- SUSE Linux Enterprise Server
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com