How to determine the next Pacemaker Cluster Action
This document (7022764) is provided subject to the disclaimer at the end of this document.
Environment
SUSE Linux Enterprise High Availability Extension 11
Situation
Sometimes it is unclear what will be the next reaction of the cluster to a change of the configuration. This is especially true if there is a handover by Admins or changes not tried before or there after encountering irregularities.
One example, example 1 used here, would be, the cluster is in maintenance and should be taken out of maintenance, but it is unclear what the next reaction will be. Whether resources will be stopped moved or otherwise be affected by the cluster.
Essentially the cluster is representing the
thats-it state
and should be compared with an
what-if state
It should be kept in mind that this is a general explanation on how to try to determine what will happen, this is by no means a blueprint.
Also this approach only covers the next reaction of the cluster to an administrative change. It cannot and does not predict the cluster actions to unpredictable changes, meaning if during the planned admin change a resource would fail unplanned then the end result of the cluster action cannot be determined by this method.
Resolution
The status in example 1 starts with the cluster being in maintenance mode.
bennevis:~ # crm_mon -1
Stack: corosync
Current DC: benromach (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Thu Mar 22 13:52:13 2018
Last change: Thu Mar 22 13:52:11 2018 by root via cibadmin on bennevis
2 nodes configured
6 resources configured
*** Resource management is DISABLED ***
The cluster will not attempt to start, stop or recover services
Online: [ bennevis benromach ]
Clone Set: cln_SAPHanaTop_HA1_HDB00 [rsc_SAPHanaTop_HA1_HDB00] (unmanaged)
rsc_SAPHanaTop_HA1_HDB00 (ocf::suse:SAPHanaTopology): Started benromach (unmanaged)
rsc_SAPHanaTop_HA1_HDB00 (ocf::suse:SAPHanaTopology): Started bennevis (unmanaged)
Master/Slave Set: msl_SAPHana_HA1_HDB00 [rsc_SAPHana_HA1_HDB00]
Masters: [ benromach ]
Slaves: [ bennevis ]
killer (stonith:external/sbd): Started bennevis (unmanaged)
HANA_IP (ocf::heartbeat:IPaddr2): Started benromach (unmanaged)
To simulate the command
crm_simulate -Sx
will be used, that uses an XML file as input file and compares that with the running cluster. This implies that the cluster has to be up and running on the node where the command
crm_simulate -Sx
is used. To create the what-if state file please execute
bennevis:~ # cibadmin -Q > /tmp/status
This queries the cluster configuration and writes the XML into the file /tmp/status.
For beauty and consistency reasons this can be copied into a kind of work with file:
bennevis:~ # cp /tmp/status /tmp/status_to_be
which can then be edited.
Example 1: "what happens if maintenance mode is disabled"
bennevis:~ # vi /tmp/status_to_be
the section in question would be
<cib crm_feature_set="3.0.10" validate-with="pacemaker-2.5" epoch="16718" num_updates="2" admin_epoch="0" cib-last-written="Thu Mar 22 13:52:11 2018" update-origin="bennevis" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair name="have-watchdog" value="true" id="cib-bootstrap-options-have-watchdog"/>
<nvpair name="dc-version" value="1.1.15-21.1-e174ec8" id="cib-bootstrap-options-dc-version"/>
<nvpair name="cluster-infrastructure" value="corosync" id="cib-bootstrap-options-cluster-infrastructure"/>
<nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/>
<nvpair name="cluster-name" value="SAPTEST" id="cib-bootstrap-options-cluster-name"/>
<nvpair name="maintenance-mode" value="true" id="cib-bootstrap-options-maintenance-mode"/>
if one changes
<nvpair name="maintenance-mode" value="true" id="cib-bootstrap-options-maintenance-mode"/>
to
<nvpair name="maintenance-mode" value="false" id="cib-bootstrap-options-maintenance-mode"/>
then the maintenance mode is disabled, the cluster is active again.
The file is saved with this new content. This is now the what-if file.
To see what would be the next action of the cluster with this change, it is
used in the command
crm_simulate -Sx
as
bennevis:~ # crm_simulate -Sx /tmp/status_to_be
Current cluster status:
Online: [ bennevis benromach ]
Clone Set: cln_SAPHanaTop_HA1_HDB00 [rsc_SAPHanaTop_HA1_HDB00]
Started: [ bennevis benromach ]
Master/Slave Set: msl_SAPHana_HA1_HDB00 [rsc_SAPHana_HA1_HDB00]
Masters: [ benromach ]
Slaves: [ bennevis ]
killer (stonith:external/sbd): Started bennevis
HANA_IP (ocf::heartbeat:IPaddr2): Started benromach
Transition Summary:
Executing cluster transition:
* Resource action: rsc_SAPHanaTop_HA1_HDB00 monitor=10000 on benromach
* Resource action: rsc_SAPHanaTop_HA1_HDB00 monitor=10000 on bennevis
Revised cluster status:
Online: [ bennevis benromach ]
Clone Set: cln_SAPHanaTop_HA1_HDB00 [rsc_SAPHanaTop_HA1_HDB00]
Started: [ bennevis benromach ]
Master/Slave Set: msl_SAPHana_HA1_HDB00 [rsc_SAPHana_HA1_HDB00]
Masters: [ benromach ]
Slaves: [ bennevis ]
killer (stonith:external/sbd): Started bennevis
HANA_IP (ocf::heartbeat:IPaddr2): Started benromach
As can be seen the change from the state
what-is , in this case maintenance-mode=true
to
what-if , in this case maintenance-mode=false
would only trigger that the rsc_SAPHanaTOP_HA1_HDB00 Monitor function would
be run as next action.
Example 2: "What would happen, if the Node where the Master is running is put in Standby"
Taking the Example 1 output the Master is running on benromach
bennevis:~ # vi /tmp/status_to_be
the section in question would be
<node id="2" uname="benromach">
<instance_attributes id="nodes-2">
<nvpair id="nodes-2-hana_ha1_remoteHost" name="hana_ha1_remoteHost" value="bennevis"/>
<nvpair id="nodes-2-hana_ha1_site" name="hana_ha1_site" value="NBG"/>
<nvpair id="nodes-2-hana_ha1_vhost" name="hana_ha1_vhost" value="benromach"/>
<nvpair id="nodes-2-lpa_ha1_lpt" name="lpa_ha1_lpt" value="1521723116"/>
<nvpair id="nodes-2-hana_ha1_srmode" name="hana_ha1_srmode" value="syncmem"/>
<nvpair id="nodes-2-hana_ha1_op_mode" name="hana_ha1_op_mode" value="logreplay"/>
</instance_attributes>
</node>
and needs a standby nvpair with value "on" like
<node id="2" uname="benromach">
<instance_attributes id="nodes-2">
<nvpair id="nodes-2-hana_ha1_remoteHost" name="hana_ha1_remoteHost" value="bennevis"/>
<nvpair id="nodes-2-hana_ha1_site" name="hana_ha1_site" value="NBG"/>
<nvpair id="nodes-2-hana_ha1_vhost" name="hana_ha1_vhost" value="benromach"/>
<nvpair id="nodes-2-lpa_ha1_lpt" name="lpa_ha1_lpt" value="1521723116"/>
<nvpair id="nodes-2-hana_ha1_srmode" name="hana_ha1_srmode" value="syncmem"/>
<nvpair id="nodes-2-hana_ha1_op_mode" name="hana_ha1_op_mode" value="logreplay"/>
<nvpair id="nodes-2-standby" name="standby" value="on"/>
</instance_attributes>
</node>
bennevis:~ # crm_simulate -Sx /tmp/status_to_be
Current cluster status:
Node benromach (2): standby
Online: [ bennevis ]
Clone Set: cln_SAPHanaTop_HA1_HDB00 [rsc_SAPHanaTop_HA1_HDB00]
Started: [ bennevis benromach ]
Master/Slave Set: msl_SAPHana_HA1_HDB00 [rsc_SAPHana_HA1_HDB00]
Masters: [ benromach ]
Slaves: [ bennevis ]
killer (stonith:external/sbd): Started bennevis
HANA_IP (ocf::heartbeat:IPaddr2): Started benromach
Transition Summary:
* Stop rsc_SAPHanaTop_HA1_HDB00:0 (benromach)
* Demote rsc_SAPHana_HA1_HDB00:0 (Master -> Stopped benromach)
* Promote rsc_SAPHana_HA1_HDB00:1 (Slave -> Master bennevis)
* Move HANA_IP (Started benromach -> bennevis)
Executing cluster transition:
* Resource action: rsc_SAPHanaTop_HA1_HDB00 monitor=10000 on bennevis
* Resource action: rsc_SAPHana_HA1_HDB00 cancel=61000 on bennevis
* Pseudo action: msl_SAPHana_HA1_HDB00_demote_0
* Resource action: HANA_IP stop on benromach
* Resource action: rsc_SAPHana_HA1_HDB00 demote on benromach
* Pseudo action: msl_SAPHana_HA1_HDB00_demoted_0
* Pseudo action: msl_SAPHana_HA1_HDB00_stop_0
* Resource action: HANA_IP start on bennevis
* Resource action: rsc_SAPHana_HA1_HDB00 stop on benromach
* Pseudo action: msl_SAPHana_HA1_HDB00_stopped_0
* Pseudo action: cln_SAPHanaTop_HA1_HDB00_stop_0
* Pseudo action: msl_SAPHana_HA1_HDB00_promote_0
* Resource action: rsc_SAPHanaTop_HA1_HDB00 stop on benromach
* Pseudo action: cln_SAPHanaTop_HA1_HDB00_stopped_0
* Resource action: rsc_SAPHana_HA1_HDB00 promote on bennevis
* Pseudo action: msl_SAPHana_HA1_HDB00_promoted_0
* Pseudo action: all_stopped
* Resource action: rsc_SAPHana_HA1_HDB00 monitor=60000 on bennevis
Revised cluster status:
Node benromach (2): standby
Online: [ bennevis ]
Clone Set: cln_SAPHanaTop_HA1_HDB00 [rsc_SAPHanaTop_HA1_HDB00]
Started: [ bennevis ]
Stopped: [ benromach ]
Master/Slave Set: msl_SAPHana_HA1_HDB00 [rsc_SAPHana_HA1_HDB00]
Masters: [ bennevis ]
Stopped: [ benromach ]
killer (stonith:external/sbd): Started bennevis
HANA_IP (ocf::heartbeat:IPaddr2): Started bennevis
As can be seen the change from the state
what-is , in this case no standby
to
what-if , in this case standby=on for the HANA Primary
would lead to the cluster action to move the IP and primary Hana to the other node and
stop all resources on the former primary Node.
Additional Information
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.
- Document ID:7022764
- Creation Date: 22-Mar-2018
- Modified Date:03-Mar-2020
-
- SUSE Linux Enterprise High Availability Extension
For questions or concerns with the SUSE Knowledgebase please contact: tidfeedback[at]suse.com