How to determine the next Pacemaker Cluster Action

This document (7022764) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 12
SUSE Linux Enterprise High Availability Extension 11

Situation

This is an advanced cluster maintenance operation. If in doubt about the consequences, one should contact the Linux System Administrator for help.

Sometimes it is unclear what will be the next reaction of the cluster to a change of the configuration. This is especially true if there is a handover by Admins or changes not tried before or there after encountering irregularities.

One example, example 1 used here, would be, the cluster is in maintenance and should be taken out of maintenance, but it is unclear what the next reaction will be. Whether resources will be stopped moved or otherwise be affected by the cluster.

Essentially the cluster is representing the

    thats-it state

and should be compared with an

    what-if state

It should be kept in mind that this is a general explanation on how to try to determine what will happen, this is by no means a blueprint.

Also this approach only covers the next reaction of the cluster to an administrative change. It cannot and does not predict the cluster actions to unpredictable changes, meaning if during the planned admin change a resource would fail unplanned then the end result of the cluster action cannot be determined by this method.

Resolution

The System used for the examples is a two node SAP Hana Scale up cluster.

The status in example 1 starts with the cluster being in maintenance mode.

bennevis:~ # crm_mon -1
Stack: corosync
Current DC: benromach (version 1.1.15-21.1-e174ec8) - partition with quorum
Last updated: Thu Mar 22 13:52:13 2018
Last change: Thu Mar 22 13:52:11 2018 by root via cibadmin on bennevis

2 nodes configured
6 resources configured

              *** Resource management is DISABLED ***
The cluster will not attempt to start, stop or recover services

Online: [ bennevis benromach ]

Clone Set: cln_SAPHanaTop_HA1_HDB00 [rsc_SAPHanaTop_HA1_HDB00] (unmanaged)
     rsc_SAPHanaTop_HA1_HDB00   (ocf::suse:SAPHanaTopology):    Started benromach (unmanaged)
     rsc_SAPHanaTop_HA1_HDB00   (ocf::suse:SAPHanaTopology):    Started bennevis (unmanaged)
Master/Slave Set: msl_SAPHana_HA1_HDB00 [rsc_SAPHana_HA1_HDB00]
     Masters: [ benromach ]
     Slaves: [ bennevis ]
killer (stonith:external/sbd): Started bennevis (unmanaged)
HANA_IP        (ocf::heartbeat:IPaddr2):       Started benromach (unmanaged)

To simulate the command

crm_simulate -Sx

will be used, that uses an XML file as input file and compares that with the running cluster. This implies that the cluster has to be up and running on the node where the command

crm_simulate -Sx

is used. To create the what-if state file please execute

bennevis:~ # cibadmin -Q > /tmp/status

This queries the cluster configuration and writes the XML into the file /tmp/status.
For beauty and consistency reasons this can be copied into a kind of work with file:

bennevis:~ # cp /tmp/status /tmp/status_to_be

which can then be edited.

Example 1: "what happens if maintenance mode is disabled"

bennevis:~ # vi /tmp/status_to_be

the section in question would be

<cib crm_feature_set="3.0.10" validate-with="pacemaker-2.5" epoch="16718" num_updates="2" admin_epoch="0" cib-last-written="Thu Mar 22 13:52:11 2018" update-origin="bennevis" update-client="cibadmin" update-user="root" have-quorum="1" dc-uuid="2">
<configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair name="have-watchdog" value="true" id="cib-bootstrap-options-have-watchdog"/>
        <nvpair name="dc-version" value="1.1.15-21.1-e174ec8" id="cib-bootstrap-options-dc-version"/>
        <nvpair name="cluster-infrastructure" value="corosync" id="cib-bootstrap-options-cluster-infrastructure"/>
        <nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/>
        <nvpair name="cluster-name" value="SAPTEST" id="cib-bootstrap-options-cluster-name"/>
        <nvpair name="maintenance-mode" value="true" id="cib-bootstrap-options-maintenance-mode"/>

if one changes

        <nvpair name="maintenance-mode" value="true" id="cib-bootstrap-options-maintenance-mode"/>

to

        <nvpair name="maintenance-mode" value="false" id="cib-bootstrap-options-maintenance-mode"/>

then the maintenance mode is disabled, the cluster is active again.

The file is saved with this new content. This is now the what-if file.

To see what would be the next action of the cluster with this change, it is
used in the command

crm_simulate -Sx

bennevis:~ # crm_simulate -Sx /tmp/status_to_be

Current cluster status:
Online: [ bennevis benromach ]

Clone Set: cln_SAPHanaTop_HA1_HDB00 [rsc_SAPHanaTop_HA1_HDB00]
     Started: [ bennevis benromach ]
Master/Slave Set: msl_SAPHana_HA1_HDB00 [rsc_SAPHana_HA1_HDB00]
     Masters: [ benromach ]
     Slaves: [ bennevis ]
killer (stonith:external/sbd): Started bennevis
HANA_IP        (ocf::heartbeat:IPaddr2):       Started benromach

Transition Summary:

Executing cluster transition:
* Resource action: rsc_SAPHanaTop_HA1_HDB00 monitor=10000 on benromach
* Resource action: rsc_SAPHanaTop_HA1_HDB00 monitor=10000 on bennevis

Revised cluster status:
Online: [ bennevis benromach ]

Clone Set: cln_SAPHanaTop_HA1_HDB00 [rsc_SAPHanaTop_HA1_HDB00]
     Started: [ bennevis benromach ]
Master/Slave Set: msl_SAPHana_HA1_HDB00 [rsc_SAPHana_HA1_HDB00]
     Masters: [ benromach ]
     Slaves: [ bennevis ]
killer (stonith:external/sbd): Started bennevis
HANA_IP        (ocf::heartbeat:IPaddr2):       Started benromach

As can be seen the change from the state

    what-is , in this case maintenance-mode=true

to

    what-if , in this case maintenance-mode=false

would only trigger that the rsc_SAPHanaTOP_HA1_HDB00 Monitor function would
be run as next action.

Example 2: "What would happen, if the Node where the Master is running is put in Standby"

Taking the Example 1 output the Master is running on benromach

bennevis:~ # vi /tmp/status_to_be

the section in question would be

      <node id="2" uname="benromach">
        <instance_attributes id="nodes-2">
          <nvpair id="nodes-2-hana_ha1_remoteHost" name="hana_ha1_remoteHost" value="bennevis"/>
          <nvpair id="nodes-2-hana_ha1_site" name="hana_ha1_site" value="NBG"/>
          <nvpair id="nodes-2-hana_ha1_vhost" name="hana_ha1_vhost" value="benromach"/>
          <nvpair id="nodes-2-lpa_ha1_lpt" name="lpa_ha1_lpt" value="1521723116"/>
          <nvpair id="nodes-2-hana_ha1_srmode" name="hana_ha1_srmode" value="syncmem"/>
          <nvpair id="nodes-2-hana_ha1_op_mode" name="hana_ha1_op_mode" value="logreplay"/>
        </instance_attributes>
      </node>

and needs a standby nvpair with value "on" like

      <node id="2" uname="benromach">
        <instance_attributes id="nodes-2">
          <nvpair id="nodes-2-hana_ha1_remoteHost" name="hana_ha1_remoteHost" value="bennevis"/>
          <nvpair id="nodes-2-hana_ha1_site" name="hana_ha1_site" value="NBG"/>
          <nvpair id="nodes-2-hana_ha1_vhost" name="hana_ha1_vhost" value="benromach"/>
          <nvpair id="nodes-2-lpa_ha1_lpt" name="lpa_ha1_lpt" value="1521723116"/>
          <nvpair id="nodes-2-hana_ha1_srmode" name="hana_ha1_srmode" value="syncmem"/>
          <nvpair id="nodes-2-hana_ha1_op_mode" name="hana_ha1_op_mode" value="logreplay"/>
          <nvpair id="nodes-2-standby" name="standby" value="on"/>
        </instance_attributes>
      </node>

bennevis:~ # crm_simulate -Sx /tmp/status_to_be

Current cluster status:
Node benromach (2): standby
Online: [ bennevis ]

Clone Set: cln_SAPHanaTop_HA1_HDB00 [rsc_SAPHanaTop_HA1_HDB00]
     Started: [ bennevis benromach ]
Master/Slave Set: msl_SAPHana_HA1_HDB00 [rsc_SAPHana_HA1_HDB00]
     Masters: [ benromach ]
     Slaves: [ bennevis ]
killer (stonith:external/sbd): Started bennevis
HANA_IP        (ocf::heartbeat:IPaddr2):       Started benromach

Transition Summary:
* Stop    rsc_SAPHanaTop_HA1_HDB00:0   (benromach)
* Demote rsc_SAPHana_HA1_HDB00:0      (Master -> Stopped benromach)
* Promote rsc_SAPHana_HA1_HDB00:1      (Slave -> Master bennevis)
* Move    HANA_IP      (Started benromach -> bennevis)

Executing cluster transition:
* Resource action: rsc_SAPHanaTop_HA1_HDB00 monitor=10000 on bennevis
* Resource action: rsc_SAPHana_HA1_HDB00 cancel=61000 on bennevis
* Pseudo action:   msl_SAPHana_HA1_HDB00_demote_0
* Resource action: HANA_IP         stop on benromach
* Resource action: rsc_SAPHana_HA1_HDB00 demote on benromach
* Pseudo action:   msl_SAPHana_HA1_HDB00_demoted_0
* Pseudo action:   msl_SAPHana_HA1_HDB00_stop_0
* Resource action: HANA_IP         start on bennevis
* Resource action: rsc_SAPHana_HA1_HDB00 stop on benromach
* Pseudo action:   msl_SAPHana_HA1_HDB00_stopped_0
* Pseudo action:   cln_SAPHanaTop_HA1_HDB00_stop_0
* Pseudo action:   msl_SAPHana_HA1_HDB00_promote_0
* Resource action: rsc_SAPHanaTop_HA1_HDB00 stop on benromach
* Pseudo action:   cln_SAPHanaTop_HA1_HDB00_stopped_0
* Resource action: rsc_SAPHana_HA1_HDB00 promote on bennevis
* Pseudo action:   msl_SAPHana_HA1_HDB00_promoted_0
* Pseudo action:   all_stopped
* Resource action: rsc_SAPHana_HA1_HDB00 monitor=60000 on bennevis

Revised cluster status:
Node benromach (2): standby
Online: [ bennevis ]

Clone Set: cln_SAPHanaTop_HA1_HDB00 [rsc_SAPHanaTop_HA1_HDB00]
     Started: [ bennevis ]
     Stopped: [ benromach ]
Master/Slave Set: msl_SAPHana_HA1_HDB00 [rsc_SAPHana_HA1_HDB00]
     Masters: [ bennevis ]
     Stopped: [ benromach ]
killer (stonith:external/sbd): Started bennevis
HANA_IP        (ocf::heartbeat:IPaddr2):       Started bennevis

As can be seen the change from the state

    what-is , in this case no standby

to

    what-if , in this case standby=on for the HANA Primary

would lead to the cluster action to move the IP and primary Hana to the other node and
stop all resources on the former primary Node.

Additional Information

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.